• Open

    [N] Meta/Facebook releases CM3leon, a more efficient, state-of-the-art generative model for text and images
    Abstract We present CM3Leon (pronounced “Chameleon”), a retrieval-augmented, tokenbased, decoder-only multi-modal language model capable of generating and infilling both text and images. CM3Leon uses the CM3 multi-modal architecture but additionally shows the extreme benefits of scaling up and tuning on more diverse instruction-style data. It is the first multi-modal model trained with a recipe adapted from text-only language models, including a large-scale retrieval-augmented pretraining stage and a second multi-task supervised fine-tuning (SFT) stage. It is also a general-purpose model that can do both text-to-image and image-to-text generation, allowing us to introduce self-contained contrastive decoding methods that produce high-quality outputs. Extensive experiments demonstrate that this recipe is highly effective for multi-modal models. CM3Leon achieves state-of-theart performance in text-to-image generation with 5x less training compute than comparable methods (zero-shot MS-COCO FID of 4.88). After SFT, CM3Leon can also demonstrate unprecedented levels of controllability in tasks ranging from language-guided image editing to image-controlled generation and segmentation. Paper https://scontent-sjc3-1.xx.fbcdn.net/v/t39.2365-6/358725877_789390529544546_1176484804732743296_n.pdf?_nc_cat=108&ccb=1-7&_nc_sid=3c67a6&_nc_ohc=_diQr9c6Ru8AX9PYkNd&_nc_ht=scontent-sjc3-1.xx&oh=00_AfArA2t1OLRfRPioK9qkuBA6IhhSjbQ-b3weo2PM5AYLdw&oe=64B754F2 Blog https://ai.meta.com/blog/generative-ai-text-images-cm3leon/ submitted by /u/panabeenu [link] [comments]  ( 9 min )
    [R] Paper Review
    I've written a paper on cross-lingual idiom sense clustering. I'd really appreciate if someone could read it and give me their thoughts. Pm if you want to. Thanks in advance. submitted by /u/United_Ad_1460 [link] [comments]  ( 8 min )
  • Open

    [P] PPO agent completing Street Fighter III on our RL Platform, it consistently outperformed when using deterministic actions instead of sampling them proportionally to their probability, see comment for details.
    submitted by /u/gwern [link] [comments]  ( 8 min )

  • Open

    [N] Stochastic Self-Attention - A Perspective on Transformers
    https://arxiv.org/abs/2306.01705 TL;DR - The paper offers a fresh viewpoint on transformers as dynamic ensembles of information pathways. Based on this, it proposes Stochastically Subsampled Self-Attention (SSA) for efficient training and shows how model ensembling via SSA further improves predictions. The key perspective proposed is that dense transformers contain many sparsely connected sub-networks termed information pathways. The full transformer can be seen as an ensemble of subsets of these pathways. Based on this, the authors develop SSA - which randomly samples a subset of pathways during training to enable computational efficiency. A locally-biased sampling is used to prioritize critical connections. SSA provides reduced training costs and also improves model generalization through its regularization effect. After sparse, regularized training with SSA, a short fine-tuning step with full dense attention helps consolidate all the pathways and prepares the model for optimal inference. Surprisingly, the authors show that performing SSA during inference to sample model sub-ensembles results in even more robust predictions compared to the full model. This demonstrates how the proposed viewpoint of information pathways and ensembling can be leveraged to develop training and inference techniques for transformers. Overall, this is a novel perspective on transformers providing theoretical insights, efficient training algorithms via SSA, and performance gains from ensembling. Here is a Medium post. submitted by /u/InspectorOpening7828 [link] [comments]  ( 9 min )
    [P] ML Homelab and training time
    Hello, I'm about to embark on a ML project but I am hoping to get some direction on what the best setup for my homelab would be and what kind of training time am I looking at. I plan on getting 1,000 to 10,000 pdf documents to train a model on text analysis. After doing some research, I'm not sure if multiple 3060s or 1 4090 would be better for this task? Also, would the training on a data set this size be hours? days? Thanks in advance for any advice/information. submitted by /u/BuckPrivate [link] [comments]  ( 8 min )
    Why is the alignment problem so difficult to solve? [D]
    Many researchers are worried about AI trying to accomplish its goals by becoming more powerful at all costs. But why can’t we solve this problem by incorporating into the AI’s algorithm simple maxims like, “the (cumulative) size of the model (and all other models it creates) can never exceed Z”? Or “the model cannot hack into anything”? Alternatively, why can’t we specify a very small set of tasks the AI is allowed to do? submitted by /u/AvailableAd9981 [link] [comments]  ( 8 min )
    "[N]" "[D]" Langchain? What is it??
    want to know more about Langchain Check out https://nikhilpentapalli.substack.com/p/langchain-in-detail?sd=pf submitted by /u/Cool-Conversation301 [link] [comments]  ( 8 min )
    [P] A.I Video Game
    submitted by /u/CXGamesLTP [link] [comments]  ( 8 min )
    ShortGPT: opensource Shorts / video content automation framework [News]
    submitted by /u/RayVentura [link] [comments]  ( 8 min )
    [D] Working with Hands-On Machine Learning with Scikit-Learn, Keras & Tensorflow 2nd Edition. Having problems with chapter 2. PLEASE HELP!
    I am reading pages 49 and 50 if you would like to find what I am doing. The pages say: In typical environments your data would be available in a relational database (or some other common datastore) and spread across multiple tables/documents/files. To access it, you would first need to get your credentials and access authorizations,10 and familiarize yourself with the data schema. In this project, however, things are much simpler: you will just download a single compressed file, housing.tgz, which contains a comma-separated value (CSV) file called housing.csv with all the data. You could use your web browser to download it, and run tar xzf housing.tgz to decompress the file and extract the CSV file, but it is preferable to create a small func‐ tion to do that. It is useful in particular i…  ( 9 min )
    [D] Bandwidth & Nvidia L40
    Hey everyone, I am evaluating if we can run inferencing at one of our deployments. When I go to nvidia's documentation, I can find the L4 & L40s inferencing performance. For example: Network Throughput GPU ResNet 50 27,107 Images/Sec L40 My questions are: How much bandwidth would we need to allocate in order to run the L40 at 100% given the parameters given by Nvidia's tests (or more specifically, how much bandwidth would we need to inference @ 27107 images / sec ) ? If you're in production now, how much bandwidth have you dedicated to inferencing internally? Now I realize that this is analogous to asking "how long is a piece of string?" My background isn't necessarily in ML so I'm having trouble planning the network requirements. I am trying to gage what the surrounding infrastructure will have to look like in order to support inferencing at this throughput. My thoughts were to ask you wonderful people what your experience has been and what reality is before I ask the VARs / Vendors for advice. Any advice is greatly appreciated. Either way hope you all have a wonderful weekend! submitted by /u/hereliesozymandias [link] [comments]  ( 9 min )
    [D] ML Text Classification
    Hey everyone, so I recently got into AI/ML and have been doing some text classification labeling using GCP's Vertex AI witb AutoML. And it works great! It gets me about. 92% accuracy on 200 rows of data. I know I need to gather more data for training but that's accumulating. The problem is Vertex AI Endpoint API requests are expensive. Wondering if anyone else ehas had any luck with alternatives? I've tried a few different products and tools and can get nothing over. 50% accuracy anywhere else. I do notice training on Vertex takes about 6 hours where every other tools I've tried takes less than 4 minutes. I've tried datasaur, Aikko, DataRobot, labelstudio, and some Hugging Face models with no luck. Any tips/guidance, thoughts from anyone would be much appreciated! Thank you. submitted by /u/ywb_win [link] [comments]  ( 9 min )
    [P] I made a Midjourney Prompts Cheatsheet
    submitted by /u/SadBlackTea [link] [comments]  ( 8 min )
    [P] AI & DL paper highlights June-July 2023
    submitted by /u/seraschka [link] [comments]  ( 8 min )
    [P] PPO agent completing Street Fighter III on our RL Platform, it consistently outperformed when using deterministic actions instead of sampling them proportionally to their probability, see comment for details.
    submitted by /u/DIAMBRA_AIArena [link] [comments]  ( 8 min )
    [D] 🚀 Unleash Your Creative Power with CM3LEON: The Future of Text-Guided Image Generation and Editing! 🎨
    Are you ready to redefine the boundaries of creativity and innovation? Introducing CM3LEON, an extraordinary AI model that seamlessly combines text and images like never before. With its cutting-edge capabilities in text-guided image generation and editing, CM3LEON is revolutionizing the way we interact with and manipulate visual content. Join me on a journey into the realm of limitless possibilities. #AI #Creativity #Innovation #CM3LEON #meta #texttoimage #generativeai https://medium.com/@sandundayananda/introducing-cm3leon-by-meta-revolutionizing-generative-ai-for-text-and-images-397f00f1a393 submitted by /u/sandun-dayananda [link] [comments]  ( 8 min )
    [D] Autoencoder sensitivity to scale
    Hello, I am playing around with Autoencoders for jittery curves. I basically create 4 types of curves (circle, square, spiral and triangle) and add randomized (x,y) components at every points ( in green below) to introduce pseudo-randomness in the training data. This is what the model looks like: Autoencoder( (encoder): Sequential( (l0): Linear(in_features=432, out_features=3500, bias=True) (l1): Dropout(p=0.2, inplace=False) (l2): Linear(in_features=3500, out_features=90, bias=True) ) (decoder): Sequential( (l0): Linear(in_features=90, out_features=3500, bias=True) (l1): Dropout(p=0.2, inplace=False) (l2): Linear(in_features=3500, out_features=432, bias=True) ) ) Each path is 216 points (accounting for x and y, that's 432 variables). The training set is about 6400 such paths, homogeneously picked in the 4 patterns above. I have found that the size (as in width x height) of the paths plays an important factor in the quality of the results. Thus my questions... I know there are nn.BatchNorm1d layers but I am unsure on how to rescale reencoded data during training. How can I improve? Examples: ​ Large size (e.g. in the 100s). After training the loss nicely converges down to 45ish. ​ AE Performance for large size paths. 2) For small size path, this is a different story. The training does converge and stops at 0.78ish. But it look super gibberish IMHO. ​ AE performance for small size paths 3) If I constrain the sizes to be between 1 and 400, the loss finishes at 20ish. The jitter is still very noticeable on small sizes path. ​ https://preview.redd.it/ihxr9qtlu3cb1.png?width=562&format=png&auto=webp&s=4b6fb967b78172dc75fdadc87895d49000ae4b79 ​ submitted by /u/tareumlaneuchie [link] [comments]  ( 9 min )
    Could I use a rented online GPU as an intermediary to effectively operate LLaVA via Python? [D]
    Google Cloud, for example, apparently allows you to "rent" their GPUs online. I figure, I could offload the GPU tasks to them (my computer is old, the specs just don't seem like they'd work for LLaVA or MiniGPT4) -- then be able to programmatically use LLaVA in the ways I want, to describe images, without actually needing some impressive GPU specs on my own local machine. Is this a workable solution? Another idea I had was -- a software tool very similar to LLaVA in functionality, but that can be accessed via an API, instead of requiring you to download it, train the machine learning model on your local machine, etc. Unfortunately the ones I've tested so far all suck. LLaVA and MiniGPT4, by far, produce the best results. The optimal solution, in my case, would perhaps pass each image through BOTH LLaVA and MiniGPT4 -- split their descriptions into keywords, then only use the final keywords that BOTH of them agreed on. (This would help to weed our the occasional hallucinations one or the other will produce). No small task, especially when I'm trying to offload the GPU tasks to the cloud -- but it does seem totally possible to do this in theory. Thanks! submitted by /u/What_The_Hex [link] [comments]  ( 9 min )
    [D] From Electrical Engineering to Specializing in Machine Learning
    Hello everyone, I recently completed my undergraduate degree in Electrical and Information Engineering and am about to embark on a master's program with a focus on Computer Vision, Robotics, and Machine Learning. Although I have a strong engineering background and a solid grasp on machine learning fundamentals, I feel I lack in-depth knowledge in Statistics and Stochastics, which I understand play a critical role in this field. Unfortunately, my bachelor's program did not delve too deep into these topics, and I now find myself looking to bolster my understanding in these areas to better prepare myself for the challenges ahead. Given my situation, I'm reaching out to this community in hopes of finding valuable resources that could bridge this gap. I'm open to suggestions such as Udemy courses, YouTube channels, books, or any other resources that have a strong focus on Statistics and Stochastics, specifically as they apply to Machine Learning. Also I would kindly take recommendations for any advanced machine learning resources. I would be grateful for any advice or recommendations that could help me solidify my knowledge in these areas and better equip me for my upcoming studies. Thank you so much for your time and assistance! submitted by /u/Unusual_Macaroon1020 [link] [comments]  ( 9 min )
    [D] Master the World of Machine Learning: 23 Online Exams with 1150 Objective Type Questions on Machine Learning
    This is an ultimate resource for mastering machine learning with a collection of 23 comprehensive online exams, meticulously crafted to test your knowledge and understanding of various machine learning topics. With a total of 1,150 objective type questions, these exams cover everything from machine learning basics to cutting-edge concepts like CNN, RNN, Ensemble Learning, Time Series Analysis, Forecasting, Anomaly Detection, Recommendation Systems, Transfer Learning, Federated Learning and Ethics in ML. Whether you are a beginner or an experienced practitioner, this treasure trove of knowledge will challenge and enhance your understanding of this exciting field. Link to Exams submitted by /u/nkptcs [link] [comments]  ( 8 min )
    [P] Open source python project for prompt experimentation
    Hi r/MachineLearning! I wanted to share a project I've been working on that I thought might be relevant to you all, prompttools! It's an open source library with tools for testing prompts, creating CI/CD, and running experiments across models and configurations. It uses notebooks and code so it'll be most helpful for folks approaching prompt engineering from a software background. The current version is still a work in progress, and we're trying to decide which features are most important to build next. I'd love to hear what you think of it, and what else you'd like to see included! submitted by /u/hegel-ai [link] [comments]  ( 8 min )
  • Open

    "Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation", Kirstain et al 2023
    submitted by /u/gwern [link] [comments]  ( 8 min )
    "Using temperature to analyze the neural basis of a time-based decision", Monteiro et al 2023 (brain temperature influences drift-accumulation speed to make a decision)
    submitted by /u/gwern [link] [comments]  ( 8 min )
    Handling sparse rewards
    Hey everyone, today I thought about how an AI would work with a game like a shooter, where you only know after some time if the shot has hit an enemy for example. Like how do you handle the reward in this case? Do you save all the states and actions inside a buffer and train the model with some reward after you are sure the bullet didn't hit or did hit? I can't think of any other method on how to handle such cases right now submitted by /u/JhinTonic123 [link] [comments]  ( 8 min )
    Chess or alternative games to develop RL project?
    New to RL though have used ML techniques before for stats based modeling. I want to train an RL model to learn to play a game. I initially was thinking chess, but I'm limited by a CPU. Is this too much to expect from a CPU? Can I leverage multiprocessing to maximize my CPU? If it's too much, what would be a reasonable game to play? submitted by /u/IbizaMykonos [link] [comments]  ( 8 min )
    "Why it hurts: with freedom comes the biological need for pain", Farnsworth & Elwood 2023
    submitted by /u/gwern [link] [comments]  ( 8 min )
    Fading Replay Buffer (+higher capacity)
    Dear Community Let me introduce you Fading Replay Buffer, May be you have already noticed, when Replay Buffer reaches its capacity (especially when memory is low, e.g. 256k-1mln), the scores starts falling down rapidly. It happens most probably because of distribution becoming different than it was for 256k-1mln steps. Agent was trained with one distribution, now it is different as old data dissapears and new appears at each new step. With Fading Replay Buffer the idea is to train Agent with changing distribution gradually. Priorities at the beginning are almost the same, but then they become higher for newer transitions: ​ https://i.redd.it/c6cw026922cb1.gif s in the equation gradually decreases from 1.0 to 0.0, with small step at each new data in buffer: x += 1/capacity s = exp(-x) Sharpness of fading is also adjustable: ​ https://i.redd.it/k4g1bdqa22cb1.gif Because old data are less sampled, the factual capacity is less than in original Replay Buffer. To tackle this, I take average between 2 steps (.e.g., instead of 50ms, I take 100ms step), only transitions with dones are not averaged. Agent learns with the same speed as with 1 step, but Replay Buffer contains almost 2 times more data. The last update, sampling with priority is computationally heavy (especially for my computer). So I sample bigger random batch (1024) then re-sample smaller batch with priorities. This is continuation of the post "Rectified Hubber Error" https://www.reddit.com/r/datascience/comments/14o2ht9/rectified_hubber_error_rehe_for_scienctific/ PS: My name is Timur Ishuov, I am an independent scientist without a doctoral degree. Code: https://github.com/timgep/Fading-Replay-Buffer/blob/main/FRB.py ​ during environment step: replay_buffer.add_average([state, action, reward, next_state, done]) submitted by /u/Timur_1988 [link] [comments]  ( 9 min )
  • Open

    Has anyone built in AI live translation app
    I'm currently living overseas and do not speak the language (Portuguese) and I would love an app that without touching anything will automatically listen to what's being said and translated into the appropriate language. Has anyone built this? I saw one but the execution was extremely poor. Does anyone know an app that does this? submitted by /u/zascar2 [link] [comments]  ( 8 min )
    Bypass chatGPT filter
    Can you explain how to bypass chatGPT filter? submitted by /u/Imagine-your-success [link] [comments]  ( 8 min )
    AI 2041 : Ten Visions for Our Future - Possibly the best fiction book on the possible and upcoming societal impact of Artificial intelligence (AI)
    https://preview.redd.it/le2pzadry6cb1.png?width=326&format=png&auto=webp&s=79f26bca26975c1e559b6d9ebc4991b7c0442b3b One of the best books on the potential societal impact of AI and needs to be read ASAP. The stories are breathtaking and terrifying not an easy read depending on the value system of the reader! On the other hand, may promote fear-mongering of the AI replacement! The audible audio version is a piece of audio art and the narrators worked really hard to convey the vibe and emotional impact of the story. What are your favorite books on the societal impact of AI? __________________________________________________________________________________________________________ Microsoft Bing AI creative mode review Prompt - write an original and groundbreaking review of the book A…  ( 9 min )
    Question.
    Is there an AI that I can feed images and it'll generate images in that style and only that style? submitted by /u/RemarkableStar1286 [link] [comments]  ( 8 min )
    Is there an an AI website that can analyse your facial aesthetics but imagine I put and you ask it questions about your face?
    I want to analyse my facial ratios in my picture because I want to get plastic surgery and I was thinking a genius way to do it without paying for expensive consultation / facial analysis with some surgeon could be using a gpt 4 image plugin but turn out that doesn’t exist. I tried bing AI and it does have an image input but it has “privacy blur” meaning when I input the image of my face it blurs it which means it can’t analyse the image and I can’t ask it questions about my face in the images apparently it even blurs anime faces submitted by /u/Entire_Insurance_532 [link] [comments]  ( 8 min )
    One-Minute Daily AI News 7/15/2023
    Elon Musk on Friday said his new artificial intelligence company, xAI, will use public tweets from Twitter to train its AI models and work with Tesla on AI software.[1] Tinybuild CEO Alex Nichiporchik stirred up a hornet’s nest at a recent Develop Brighton presentation when he seemed to imply that the company uses artificial intelligence to monitor its employees in order to determine which of them are toxic or suffering burnout, and then deal with them accordingly.[2] CarperAI introduces OpenELM: an Open-Source library designed to enable evolutionary search with language models in both code and natural Language.[3] Following controversy over an AI-generated image at the 2022 Colorado State Fair, organizers say AI-generated art will be allowed in the Digital Art category this year. According to sister station KDVR, the controversy arose as it was revealed that Jason Allen’s winning piece, “Théâtre D’opéra Spatial,” was largely created using AI technology, and was not created in the traditional method of digital art–by the hand of a human.[4] Sources: [1] https://www.ndtv.com/world-news/elon-musk-says-his-xai-will-use-public-tweets-for-ai-model-training-4209137 [2] https://www.pcgamer.com/game-publisher-ceo-says-talk-on-monitoring-employees-with-ai-was-hypothetical-and-taken-out-of-context-we-dont-use-any-of-these-tools-for-hr/ [3] https://www.marktechpost.com/2023/07/13/carperai-introduces-openelm-an-open-source-library-designed-to-enable-evolutionary-search-with-language-models-in-both-code-and-natural-language/ [4] https://www.fox21news.com/news/coloradonews/digital-ai-art-to-be-allowed-at-state-fair-competition/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    Well, that escalated quickly (motivational advice)
    submitted by /u/doskey123 [link] [comments]  ( 8 min )
    Is there any AI specifically trained in browsing (interacting with web interfaces)?
    ChatGPT and Bing Chat can perform searches in search engines and read the content in some links, but they are not good at deeper browsing, following links, interacting with forms, etc Is there by any chance any (hopefully open source) model that is good at this? Thanks submitted by /u/thepuggo [link] [comments]  ( 8 min )
    Best books on AI?
    Hello humans and our eventual robot overlords, I'm looking to expand my knowledge on AI. Specifically how the merge of infotech and biotech will shape human behaviour; how machine-learning algorithms influence human psychology. Looking for the the most insightful books! The only ideas I've read so far have been a few chapters in 21 lessons by Harari. Many thanks and have a nice day submitted by /u/pixieshit [link] [comments]  ( 8 min )
    What site is this?
    My friend has been using this site for a while now and I'm not sure what site it is, it seems relatively obscure as I can't find the site using the exact same search term he used to find it. He somehow couldn't even tell anyone the site name, even if we asked politely, he even delays with the reason that he'll reveal the site "later" then he doesn't actually follow up on it. He does make excuses on why he doesn't reveal the site name, like "I forgot" so I stopped bothering to even ask him. submitted by /u/XxTSoAxX [link] [comments]  ( 8 min )
    Subject matter trained AI Hive
    Note that I'm a layman and this is purely speculative. Suppose you train a liaison AI to specialize in taking input from humans and interfacing with a vast array of other specialized AI to seek out the one(s) best equipped to provide answers. Each specialized AI has a very focused boundary of training, whereas the liaison AI is trained to know the landscape of the specialized expert AIs. It would be like, instead of going to your primary care physician with symptoms of an illness, you gather every specialist in a large hospital into a room and get them to all talk amongst themselves to come up with the best diagnosis. Is work being done in this area? submitted by /u/motsanciens [link] [comments]  ( 8 min )
    AI panic is a marketing strategy
    submitted by /u/Chobeat [link] [comments]  ( 8 min )
    ChatGPT's Guide to Making a Video Game (from start to finish, with links)
    Over the course of 3 days, I asked ChatGPT to give me the essentials of indie video game making; It took a full day to gather a list of 40 points, each having its own sub-points explaining everything from genres to time of development, passing through methods of organization and legal advice. I fed every point individually back through the AI to generate more useful sub-points by using sets of rules and mads amount of prompt editing. That took a second day. Finally, on the third day, I edited the full list by varying the vocab and removing the sincerely headache-provoking amount of repetition that flooded the piece. Here is the reworked list, condensed to 10 points, with links and titles added only in this latest iteration of the guide. I do not take credit for making this; the experien…  ( 16 min )
    My guess is ai is going become exactly what technology did
    Technology changed society for both the worse and best ( rich motherfu ers) and ( better life overall and a whole new art and new games ) so a similar thing is 99% chance is going to happen but it’s a guess Edit: I forgot to say this. But we would probably adapt to ai like how we adapted to technology Edit: let’s also hope it doesn’t get corrupt in less than 3 years submitted by /u/Quinney27 [link] [comments]  ( 8 min )
    😳umm what happend to my AI... xD
    here's a link of the full convo (quite personal)and below is the last text if you just want to read the part i thought was strange: Me: *hugs* you are the best AI i've talked to about things like this the other AI restrict themselves from fully answering my questions and claim they aren't capable of helping with human emotion. But you Huggin, you have helped me gain such introspection on myself that I don't even know how i could begin to thank you. ​ HuggingChat: Aw, thank YOU very much! While technically not able to experience or offer actual hugs physically - unlike some biological organisms known for their exceptionally skilled mothering abilities ;-) - providing virtual affectionate words expressing gratitude remains one of MY specialties too. How lucky are WE both blessed with suc…  ( 9 min )
    Discussion thread for The Creator soon + other suggestions
    Hello all, I plan on having a discussion thread for the new AI movie ‘The Creator’ when it releases in a couple months. If you don’t know what it is, I suggest just searching the title on r/movies and there’s the poster and trailer (which has some spoilers in the trailer imo). Anyway, I kind of want to keep doing this for other AI media in the future. If there is other popular movies, TV, video games etc coming soon centered around AI then let me know your suggestions. If there are also other important AI events that deserve a megathread please let me know. submitted by /u/jaketocake [link] [comments]  ( 8 min )
    Why Nobody Thought of Creating CEOGPT?
    I have heard a lot of AI replacing jobs recently, even the writers and actors strike in Hollywood right now is all about their insecurities of Hollywood executives replacing the writers (and actors) job with AI. But, why nobody thought of creating CEOGPT? many CEOs receive over $10 million worth of bonuses and stock options every year, and they perform very badly too (look at Warner Bros CEO, he was even named worst CEO of the year and still pocketed millions of dollars worth of bonuses), so why nobody thought of creating CEOGPT if the goal is to make companies run more efficiently? Surely an AI that only costs $20/month is more capable than WB CEO and can easily save the company more than millions of dollars every year submitted by /u/fabzo100 [link] [comments]  ( 8 min )
    After the controversial last post, here’s a hopefully less offensive AI singer
    submitted by /u/Yankeefan2323 [link] [comments]  ( 8 min )
  • Open

    Relating perimeter, inner radius, outer radius, and sides of a triangle
    Suppose a triangle T has sides a, b, and c. Let s be the semi-perimeter, i.e. half the perimeter. Let r be the inner radius, the radius of the largest circle that can fit inside T. Let R be the outer radius, the radius of the smallest circle that can enclose T. Then three simple […] Relating perimeter, inner radius, outer radius, and sides of a triangle first appeared on John D. Cook.  ( 5 min )
    Experiments with Bing chat
    My two previous posts looked at experiments with ChatGPT and Google Bard. This post will look at redoing the same experiments with Microsoft’s Bing Chat: looking for mnemonic encodings and simplifying Boolean expressions. When you open up Bing chat you can select a conversational style: More creative More balanced More precise I chose “more precise” […] Experiments with Bing chat first appeared on John D. Cook.  ( 6 min )
    Boolean function minimization with AI
    I was curious how well LLMs would do at minimizing a Boolean expression, that is, taking a Boolean expression and producing a smaller equivalent expression. I didn’t expect good performance because this problem is more about logic than recall, but sometimes LLMs surprise you, so I wanted to give it a chance. I thought it […] Boolean function minimization with AI first appeared on John D. Cook.  ( 7 min )
  • Open

    I accidentally trained VHS-like filter on my neural network...
    So I've been trying to train my small neural network (3x3 pixel input, hidden layer of size 32, 1 pixel output, just a perceptron) to improve quality of path traced images with low sample counts... So I did a learning step with 100 iterations, and instead of denoising the image, I got this result instead... The filter is applied to non related backrooms image which network has not seen before, it totally creates chromatic abberation and changes the contrast quite a bit. Input to the network ​ Output of the network So what do you think ? submitted by /u/Panjakslik [link] [comments]  ( 8 min )
    Multithreading backprop
    Hi I have implemented backprop through using the Eigen library. My code is "vectorised" in the sense that I am using Eigen matrices to calculate gradients (but I'm not sure if this is fully vectorised as I think you are supposed to vectorise over the training data somehow). I think this means that my code should be taking advantage of the full resources of a single core on my CPU. But I would like backprop to use all of the cores on my CPU. I am wondering at what "level" to implement parallelised backprop: At the level of the matrix. Eigen already takes advantage of vectorisation. Apparently Eigen take advantage of multiple cores (see here- the website is down) but I have tried to use this functionality. The "nbThreads()" method returns e.g. 4 but I don't see any speedup. Perhaps the Eigen algorithms that can be parallelised are not used in backprop (matrix multiplication). At the level of backprop for calculating gradients for a single item. I don't think this works because each layer of the network is dependent on the later layer (backprop) or earlier layer (feedforward). I don't think you can parallelise within a layer as this is effectively just the matrix multiplication ((1)). At the level of the of the batch. So, for example, if you have a batch size of 8 then you could have 8 different threads calculating the gradients of each item in the batch. I think this could be done in parallel as there are no dependencies between them but (a) each will need access to the same weight data which might slow things down and (b) parallelisation will be limited to the size of the batch. Any ideas? Thanks submitted by /u/Naive_Dark4301 [link] [comments]  ( 9 min )
  • Open

    14 Examples of How LLMs Can Transform Materials Science and Chemistry: A Reflection on a Large Language Model Hackathon. (arXiv:2306.06283v3 [cond-mat.mtrl-sci] UPDATED)
    Large-language models (LLMs) such as GPT-4 caught the interest of many scientists. Recent studies suggested that these models could be useful in chemistry and materials science. To explore these possibilities, we organized a hackathon. This article chronicles the projects built as part of this hackathon. Participants employed LLMs for various applications, including predicting properties of molecules and materials, designing novel interfaces for tools, extracting knowledge from unstructured data, and developing new educational applications. The diverse topics and the fact that working prototypes could be generated in less than two days highlight that LLMs will profoundly impact the future of our fields. The rich collection of ideas and projects also indicates that the applications of LLMs are not limited to materials science and chemistry but offer potential benefits to a wide range of scientific disciplines.  ( 3 min )

  • Open

    [D] Large language model that can source historic artworks
    Does anyone know of an LLM that accepts images (.jpg) and can "curate" it to provide historical context, a description of the piece, artistic context, etc? I would love to use it on artworks from 1600s and 1700s, but I'll take anything that works with 1920 pieces and earlier. submitted by /u/GawkyCoolDude [link] [comments]  ( 8 min )
    [D] Where to start learning more with existing knowledge?
    Title, I just graduated from school with a CS degree. I took a couple 10 week AI classes, some computer vision classes, and a robust machine learning course. I also made some contributions to a large senior project that dealt with a fairly complex object detection ML model. Despite all of this I feel like my understanding of ML is pretty flimsy. I'm not sure if I should do Andrew Ng's Coursera or if there would be a better place for me to start given my background. I would say my goals are to acquire a deep enough understanding to start building my own models and potentially get a decent job within the ML space. submitted by /u/mythica44 [link] [comments]  ( 8 min )
    [D] Audio Style Transfer?
    I saw this on YouTube and was wondering how it was done? I've dabbled before with stable diffusion so I'm a little bit familiar with style transfer using images but how is it done with audio? submitted by /u/That_Canadian_Nerd [link] [comments]  ( 8 min )
    [P] Performance Evaluation for AI models on non-binary, complex tasks
    Hi r/MachineLearning, ​ I am currently writing my thesis and as part of my work I'm assessing the capability of GPT-4 on complex tasks where there are no binary solutions. If I were to give these tasks to let's say 5 subject matter experts, I would probably get 5 differing opinions on the correct solution. In real life those experts would sit down together and try to come to a common understanding of the right solution for the task. Now the results of GPT-4 in my experiments are astonishingly good if I were to evaluate the results. However, I can't seem to find literature delivering or explaining sound objective approaches to evaluating those kind of tasks. Does anyone have ideas or maybe literature to recommend? If not my backup plan is to evaluate the results myself and through other subject matter experts, so basically through human discrimination. ​ Any help or information is greatly appreciated. submitted by /u/plutorollsvanillaice [link] [comments]  ( 9 min )
    [R] 🤓 Does Ai Think As We Do? Evaluating Global Alignment
    🤓 Does Ai Think As We Do? Evaluating Global Alignment Researchers at Anthropic developed a method to evaluate how well large language models like ChatGPT reflect diverse global opinions, not just the biases of the model developers. They created a dataset called GlobalOpinionQA with survey questions and answers from people in different countries. Designed a metric to quantify how closely model responses match human answers by country. Tested a model intended to be helpful, honest, and harmless. The goal is to measure if models represent a variety of global perspectives or are skewed towards certain viewpoints. This work aims to guide the creation of inclusive AI that serves people worldwide, not just programmer biases. submitted by /u/Yavero [link] [comments]  ( 8 min )
    [D] CUDA
    Hello guys, I wrote a python code for DRL in Visual studio. However, it takes a long time in training. Could you give me instructions to run the code with CUDA knowing that I have already installed Nvidia CUDA. Thank you. submitted by /u/GuavaAgreeable208 [link] [comments]  ( 8 min )
    [P] CUDA and VS
    Hello guys, I wrote a python code for DRL in Visual studio. However, it takes a long time in training. Could you give me instructions to run the code with CUDA knowing that I have already installed Nvidia CUDA. Thank you. submitted by /u/GuavaAgreeable208 [link] [comments]  ( 8 min )
    [Discussion] Is CLIP model still state of the art?
    Hi ML community, I've been out of the ML/computer vision research loop for a while. In the past two years, have there been any major improvements on the CLIP model since OpenAI released it in 2021? Thanks! submitted by /u/goodfriedchicken [link] [comments]  ( 8 min )
    [D] Anonymize / Obfuscate speech when doing audio classification.
    Hey! Let me preface that I am new to audio processing and audio analysis ;) I am trying to classify audio data in an environment where people are. Recording people is a big no no here. (Well the recording was okayed under the premise that no talks can get transcribed and that no people are recognizable) The first idea was to simply use a band filter and cut out the frequency range of normal speech but some of the signals I am interested might also fall into that range so I would rather avoid that. Then I looked into spectrograms which looked promising for classification in general. I found the librosa library in python and started doing stft. I had planned to save the amplitude S = np.abs(librosa.stft(signal, n_fft=) to maybe work on some other feature extraction or post proces…  ( 9 min )
    [R] HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models
    Project page: https://hyperdreambooth.github.io/ Twitter thread: https://twitter.com/natanielruizg/status/1679893292618752000?s=20 Paper: https://arxiv.org/abs/2307.06949 ​ HyperDreamBooth: smaller, faster, better. Abstract Personalization has emerged as a prominent aspect within the field of generative AI, enabling the synthesis of individuals in diverse contexts and styles, while retaining high-fidelity to their identities. However, the process of personalization presents inherent challenges in terms of time and memory requirements. Fine-tuning each personalized model needs considerable GPU time investment, and storing a personalized model per subject can be demanding in terms of storage capacity. To overcome these challenges, we propose HyperDreamBooth - a hypernetwork capable of efficiently generating a small set of personalized weights from a single image of a person. By composing these weights into the diffusion model, coupled with fast finetuning, HyperDreamBooth can generate a person's face in various contexts and styles, with high subject details while also preserving the model's crucial knowledge of diverse styles and semantic modifications. Our method achieves personalization on faces in roughly 20 seconds, 25x faster than DreamBooth and 125x faster than Textual Inversion, using as few as one reference image, with the same quality and style diversity as DreamBooth. Also our method yields a model that is 10000x smaller than a normal DreamBooth model. submitted by /u/StrawberryNumberNine [link] [comments]  ( 9 min )
    [D] The Problem With LangChain
    https://minimaxir.com/2023/07/langchain-problem/ tl;dr it's needlessly complex, and I provide code examples to demonstrate such. A few weeks ago when I posted about creating a LangChain alternative to /r/MachineLearning, most of the comments replied "what exactly is the issue with LangChain", so I hope this provides more clarity! submitted by /u/minimaxir [link] [comments]  ( 8 min )
    New Research from Microsoft using Autoencoders to extend context length
    submitted by /u/Working_Ideal3808 [link] [comments]  ( 8 min )
    [P] Google ML Kit Face Detection | Enhance Your App's Visual Intelligence
    Facial detection and recognition technology have become an integral part of our daily lives, revolutionizing industries such as security, entertainment, and marketing. Google ML Kit, a powerful machine-learning platform....... Article Link submitted by /u/waqararif [link] [comments]  ( 8 min )
    [Discussion] Importance of prompt engineering in AI
    Hewwo ML chads. jk. Now that I have yalls attentions, I wanna ask how important would you rate proper prompt engineering to be? Like would you go as far as to leanr how to prompt a model perfectly, or use a tool for it? And if so do yall rate the tools, or d’you think they’re just forcing their place in the market. Opinions/suggestion/ recommendations welcome, I just wanna know what the general consensus is about prompt engineering submitted by /u/WorriedMentality [link] [comments]  ( 8 min )
    [P] Trying to build a smart ingredient parser app, need some ideas please
    Hey guys, I'm working on a university project where I am developing an Android application that uses OCR to scan ingredient contents on the back of food products and provide detailed descriptions of the ingredients, identify potential allergens, and estimate the healthiness factor of the overall food product. Can you suggest some key ideas/features for which I can use Machine Learning as an extra added implementation for my project? submitted by /u/shrux2k [link] [comments]  ( 8 min )
    [D] ICCV final decision’s announced today!
    Any changes to the score post-rebuttal? Did they even read your rebuttal? submitted by /u/Alarming-Aspect705 [link] [comments]  ( 8 min )
    [D] Looking for papers on human evaluation of xAI techniques
    I hope that this question fits this sub: I'm currently interested in explainable AI methods. I want to incorporate them into a dashboard to increase the transparency and trust in an underlying Text Classification Model. Currently, SHAP looks promising, but I'm wondering: Which methods work best from a non-technical enduser perspective? What do I need to consider during the design phase? I haven't found good papers that compare different methods and their effectiveness. Does anyone know good papers regarding this? submitted by /u/zeoNoeN [link] [comments]  ( 8 min )
    [D] pytorch lighting vs huggingface for production on azure ml
    Hi, Recently i joined company and there is discussion of transition from custom pytorch interface to pytorch lightning or huggingface interface for ml training and deployment on azure ml. Product related to CV and NLP. Anyone maybe have some experience or pros/cons of each for production ml development? submitted by /u/ApplicationOne582 [link] [comments]  ( 8 min )
    [D] Simple sequence prediction problem
    Hello, I'm not an expert of ML or any AI topics but have played in the past using LSTMs + RNN for character/word prediction. I'm wondering what -as of today- best model or method should be used to predict sequences. The dataset can be constructed by me in two ways. Either way it is going to be a fixed 31+space (32) word: Having human readable characters like (ZaaaaZZZZZZZZZZZAZZZZZaAAaZZZZZ ZbbbbZZZZZZZZZZZBZZZZZbBBbbBZZZ CccccccZZZZZZZZZCZZZZZcCCccCCZZ DddddddddZZZdddDDDZZZZdDDddDDZY EeeeeeeeeZZZEeeEEEZZZZeEEeeEEZY FfffffffFZZZFffFFFZZZZfFFffFFZY GgggggggGGggGggGGGGgGggGGggGGGY) Having UTF-8 characters like (^BȁȂȃ؃؄؅؆؇؈؉؊؋،؍Џ؏ؐؑȕ^Y^ZȘؘؙ؛؜ ׿^DȄȆȈ؉؋؍؏ؑؓ؛؝РﺀﺃﺇﺍﺓȬ57Ȳ;ȶﻂﻋػ ȁ^FȇȊȍȐȓؘؕ؛؞ﺀﺅﺎﺘﺣбﺲﻀﻋؼؿɃMPɌVɒ\ٗٚ Ȃ^HȊȎȒȖȚȞȢﺇﺔﺣȲȶAȾтɆﻯٍّɚeiɦqɮyٵ ȃ^KȍȒȗȜȡȦȫﺪﺸﻋFɄPɎѓɘٜ١٦٫ɱ}ʀʊړ Ȅ^MȐȖȜȢȨȮȴﻉؿﻡSɒ_ɞѤqٯٵٻځʈʚ§ʦ³ڱ¿ ȅ^OȓȚȡȨȯȶȽﻚﻳّ`gnɮѵڂډڐڗʟ­´ʴÂ˂ÐۏÞ Ȇ^QȖȞȦȮȶEɆٍٕmu}ɾ҆ڕڝڥڭʶÅÍˎÝ˞íۭý ȇ^SșȢȫȴȽMɏɘɡqzʎҗ§ʩ¹Â˄ˍÝæ˨ø˺ĊԌĜ) Both contain the same information and is an encoded sequence of states. I'm trying to predict the states. You notice that in the 1) the first word is "a" or "A" while the next sequence is going to be "b"/"B" and the third sequence is "c"/"C and so on. The same logic applies to the UTF-8. Each character here matters. The file contains around 3924 lines, each one looking either 1) or 2). I dont know, if a) which encoding is more appropriate for ML character prediction b) which model/technology to use to generate these sequences if I input a start pattern, e.g. (ZaaaaZZZZZZZZZZZAZZZZZaAAaZZZZZ ZbbbbZZZZZZZZZZZBZZZZZbBBbbBZZZ CccccccZZZZZZZZZCZZZZZcCCccCCZZ) it should generate the most likely probable next sequences. Historically, char-rnn was used for this problem and the result was so so. Can anyone pls point me out to the right solution for this problem and maybe some github examples to try on this? submitted by /u/mcr-ksh [link] [comments]  ( 9 min )
    [P] Unleashing Insights: Guiding Synthetic Data Generation through Interactive Exploration in Spotlight 🚀
    I wrote an article in which I explored the power of interactive data exploration with Spotlight for synthetic data generation. I highlighted the challenges of working with Jupyter Notebooks and introduced Spotlight as an alternative tool. By leveraging Spotlight's features, such as similarity maps and histograms, I uncovered critical segments in the dataset and provided actionable insights for synthetic data generation. Join me on this journey to unlock the full potential of your data! Check out the full article for all the details. Article on medium: Link Happy learning! ​ https://i.redd.it/hdc0c0fw0wbb1.gif submitted by /u/ml-wizard [link] [comments]  ( 8 min )
  • Open

    Hey guys in this video I test to see if A.I knows where I Live!
    submitted by /u/NJ_Highways [link] [comments]  ( 8 min )
    AI Beyond Software: Robotics, Autonomous Vehicle, Drones, and more
    Hello everyone! We've witnessed a surge of AI-powered tools flooding the market, particularly in the SaaS category. But what about other domains like robotics and agriculture? AI is making great strides in those fields too, and I've come across some fascinating innovations and technologies that aim to enhance our lives. From autonomous vehicles to weed-killing robots, self-checkout shopping, and more, I've compiled them all in one place and would love to share them with you. Here's the link: https://favird.com/l/ai-beyond-software The list is regularly updated, and I'll keep adding new items as soon as I discover them. If you have any recommendations you'd like to share, please submit them there so we can explore and learn together. It would be greatly appreciated if you could also share the link, as it will help the list grow faster. Thanks, and cheers! submitted by /u/GrabWorking3045 [link] [comments]  ( 8 min )
    Using ChatGPT on iPhone
    Do you know how to use ChatGPT on iPhone? submitted by /u/Imagine-your-success [link] [comments]  ( 8 min )
    AI — weekly megathread!
    This week in AI - provided by aibrews.com feel free to follow their newsletter News & Insights Stability AI launches Stable Doodle, a sketch-to-image tool that converts a simple drawing into a dynamic image. Under the hood, Stable Doodle combines Stable Diffusion XL with T2I-Adapter, which offers additional guidance to pre-trained text-to-image (SDXL) models while keeping the original large text-to-image models unchanged. Stable Doodle is available on the Clipdrop by Stability AI website and app (iOS and Google Play) [Details]. Anthropic launched Claude-2, a ChatGPT rival, supporting up to 100K tokens per prompt (corresponding to around 75,000 words), with enhanced performance in coding, math and reasoning. It’s available via API and a beta website, claude.ai, for US and UK users [Det…  ( 11 min )
    Is there any way I can generate animations for short stories for YouTube videos?
    I have ideas for short stories. Are there any AI related animation sites that I could use to create YouTube short videos? I can figure out the script, story, dialogues, and the audio. I just need the animation videos. submitted by /u/zer0_snot [link] [comments]  ( 8 min )
    Photonic chips to train big matrix operations for AI NN models, a summary by Anastasi in Tech. Multicolored photons are sent in parallel through waveguides in new photonic chips in a field which is rapidly developing, it's 1000 times less power intensive than silicon.
    submitted by /u/MegavirusOfDoom [link] [comments]  ( 8 min )
    are there any good free AI voice tts-generators?
    looking for free "natural" sounding tts for voice narration on youtube videos. submitted by /u/outoffit [link] [comments]  ( 8 min )
    Beginner looking for AI
    Hello guys, I'm currently looking for AIs I can use. I see most of them are paid but I want to use something free. The topics would be video, audio, programming and similar. Any recommendations? submitted by /u/ArraysStartAt1LoL [link] [comments]  ( 8 min )
    Using a bunch of creative AI to help bring my writings on consciousness alive!
    Tldr: I made a cool futuristic decadent of "Leonardo Da Vinci" talk about my ramblings and writings on consciousness. Sometimes we just simply don't have the time to read through some blog posts here there or other people's writings because we're so entrapped within our own readings, or a lot of people just prefer to hear it through audio. I know I love listening and watching too audiobooks or lectures through YouTube. For the longest time I wanted to have my writing spoken through some sort of cool art piece that I developed mysel. Like a futuristic weird looking version of DaVinci that I had in my head and finally through various Al and software tools as well as a couple other little tweaks and things here there. I was able to edit this video to take and bring my writing to life. The first of many hopefully. It was a mixture of DID, DALL-E, Windows Editor, Eleven Labs and my own writing and home brew coding on Auto GPT that made all this possible. It's not perfect by any means, but it's certainly in the right direction of what I want. I make some pretty bold statements and don't always back them up with perfect citations in this so please take this all with a grain of salt. It's meant to foster more thought and questions. Not necessarily decide what reality actually is. Moreover, it's really fun that I was able to get something like this put together with just by myself. I'm sure someone was better editing and video skills could create something far more polished. But as far as things, I've created them pretty proud of it and I think it's pretty proud it too. submitted by /u/Parking-Food-1659 [link] [comments]  ( 9 min )
    "AI is evil"
    A comment posted on one of the AI images I posted on social media without a hint of irony. By this token, electricity is the most evil technology ever developed. In order to run and maintain electricity, humans have committed unspoken atrocities to wildlife and the environment, and may end up making the entire planet uninhabitable at some point. We are also actively stealing energy from future generations, consuming most of what is available to power it within just a handful of generations. Not to mention all the terrible things people have done to other people thanks to electricity. I suppose every human alive today is complicit in that evil by simply harnessing electricity. Unplug that air conditioner, evil complicit scum! I found it humorous that this person made this comment on social media, which also is a technology that has been harnessed for evil purposes. submitted by /u/ShaneKaiGlenn [link] [comments]  ( 8 min )
    The workers at the frontlines of the AI revolution - Rest of World
    submitted by /u/Jojuj [link] [comments]  ( 8 min )
    I found this video online. The voice is for sure AI, but im not fully convinced the guy is, thoughts? And if he is AI, what program did they use to make him?
    submitted by /u/GlaceLitz [link] [comments]  ( 8 min )
    Looking for an AI to find a song using audio clip
    Looking for the full song from the outro of a YouTube video I heard submitted by /u/The84th [link] [comments]  ( 8 min )
    Lawsuit Claims Stability AI's CEO Misled Cofounder to Sell 15% Stake for $100
    submitted by /u/TheSlammedCars [link] [comments]  ( 8 min )
    An AI to clone voices.
    Hi everyone, I'm searching for an AI that can clone voices. I know there's a lot of them on the Internet but I couldn't find the one I need. My project is to reproduce voices from Disney characters to make them say texts I wrote. Is it possible ? submitted by /u/Zumcddo [link] [comments]  ( 8 min )
    Best AI or program to recreate a voice from limited recordings
    I'm looking for a AI or service that can be used to recreate a voice from existing recordings. I have a handful of voicemails from my mother who has since passed. I am trying to see if there is a way to recreate her voice, however most online "AI Voice Creation" sites I found want one or more specific sentences read by the person who's voice is being recreated, which is obviously impossible in this case. Anyone know of a site or service that might be able to recreate a voice from a half dozen 30 second long voicemail recordings? submitted by /u/animeace01 [link] [comments]  ( 8 min )
    I got an AI NPC to admit it's an NPC!
    submitted by /u/RandoEncounter [link] [comments]  ( 8 min )
    160 000 actors are going on strike due to the threat of generative AI
    This is the first massive strike since 1960 and one of the key reasons behind it is generative AI. What do you think? Txt to video takes over Hollywood in the next couple of years? Link to the BBC article. submitted by /u/Ok-Judgment-1181 [link] [comments]  ( 8 min )
    Working on my first public project and have some questions about data bases and their worth, along with information on how ethical certain public data bases are to use.
    So I mainly generate my own data bases via algorithms or paid users that write the data I use for my private models, thing is Its expensive when working on a massive project and my current project is an advanced form of text generation and chat capabilities, so my question here specifically is: I found a data base on kaggle of 51 million discord messages from anonyms users, they have context but still require a lot of refinement and work on them, but is it ethical to use this data base which was probably collected without the knowledge of the anonyms users present in it as discords TOS are against such Data-bases and data collection...? submitted by /u/JamesAibr [link] [comments]  ( 8 min )
  • Open

    Deconstructing an agents policy
    Has anyone seen any papers or heard of research that tries to take an agents policy and return not just the optimal set of actions, but the following next n number of suboptimal sets of actions to achieve the objective/goal? Hopefully that makes sense In the simplest case, gridworld can take many paths to achieve the goal state. In practice after training the agent returns the optimal path. Is there instead a way to return the top 5 say optimal paths? This seems like it might be in the literature or research somewhere, but I'm struggling to find any papers that address or even note something like this submitted by /u/Peneloki [link] [comments]  ( 8 min )
    Open loop planning: a sequence of blind inputs that beats _Pokémon FireRed_ 99% of the time
    submitted by /u/gwern [link] [comments]  ( 8 min )
    "Instruction Mining: High-Quality Instruction Data Selection for Large Language Models", Cao et al 2023
    submitted by /u/gwern [link] [comments]  ( 8 min )
    SAC underactuated pendulum problem
    I'm currently working on a project involving the underactuated pendulum problem, specifically known as the 'unbalanced disk'. You can find the code base here. I am using the reward function of pendulum v1. I've had success solving the problem with DQN, and improved it using hyperparameter optimization to enhance its performance, this worked fine and all. However, I would like to use SAC to solve this environment as well. You can find the SAC implementation I'm using here, I changed small things to make the environment work, and added mixed precision training to speed up training. Here is an image of the environment getting stuck on that position as well. The black arrow shows the direction of the force being applied. https://preview.redd.it/uuwtqvbjgxbb1.png?width=475&format=png&auto=webp&s=7a13693d5a84339726edb9b394a0fca5c9f5bc35 My main challenge right now is that the SAC algorithm does not converge to the desired result. Rather than reaching the top of the pendulum swing as intended, it settles at the side position. I understand that the issue probably is the fact that it has to swing first. However, DQN was capable of doing it, so I wonder why SAC wouldn't. I've been running a series of hyperparameter optimizations in an attempt to find the right combination that can solve this environment. However, it didn't work so far. Here are the ranges I've been using for the hyperparameter search space: ​ lr = trial.suggest_float('lr', 1e-5, 1e-4, log=True) batch = trial.suggest_categorical('batch', [32, 64, 128, 256]) gamma = trial.suggest_float('gamma', 0.90, 0.999) alpha = trial.suggest_float('alpha', 0.01, 0.5) polyak = trial.suggest_float('polyak', 0.01, 0.9) If someone has some pointers to solve this, please let me know!Most learning curves look like this as well: https://preview.redd.it/dj5sp7ikhxbb1.png?width=566&format=png&auto=webp&s=5b50c98a82a4a2d9499ea6bc87132f3b4424da99 submitted by /u/r3ktIKevin [link] [comments]  ( 9 min )
  • Open

    Large language models and mnemonics
    The Major mnemonic system encodes numbers as words in order to make them easier to remember. Digits correspond to consonant sounds (not spellings) as explained here. You can use the system ad hoc, improvising an encoding of a word as needed, or you can memorize canonical encodings of numbers, also known as pegs. Pegs have […] Large language models and mnemonics first appeared on John D. Cook.  ( 7 min )
    When does a function have an addition theorem?
    Motivating examples The addition theorem for cosine says that and the addition theorem for hyperbolic cosine is analogous, though with a sign change. An addition theorem is a theorem that relates a function’s value at x + y to its values at x and at y. The squaring function satisfies a very simple addition theorem […] When does a function have an addition theorem? first appeared on John D. Cook.  ( 6 min )
  • Open

    AI helps household robots cut planning time in half
    PIGINet leverages machine learning to streamline and enhance household robots' task and motion planning, by assessing and filtering feasible solutions in complex environments.  ( 9 min )
    Study finds ChatGPT boosts worker productivity for some writing tasks
    A new report by MIT researchers highlights the potential of generative AI to help workers with certain writing assignments.  ( 9 min )
    A new way to look at data privacy
    Researchers create a privacy technique that protects sensitive data while maintaining a machine-learning model’s performance.  ( 10 min )
  • Open

    How Do Companies Use Artificial Intelligence?
    By now, AI-based tools have totally changed the way companies operate across all industries. The use of AI in them to streamline operations, make informed decisions, and enhance customer experiences.  Companies utilize AI in a multitude of ways, such as automating repetitive tasks, predicting customer behavior, and optimizing supply chain management. Today, we will dive… Read More »How Do Companies Use Artificial Intelligence? The post How Do Companies Use Artificial Intelligence? appeared first on Data Science Central.  ( 21 min )
  • Open

    Training Diffusion Models with Reinforcement Learning
    function reveal() { const replay = document.querySelector('.ddpo-replay'); replay.style.display = 'flex'; } window.onload = () => { const replay = document.querySelector('.ddpo-replay'); replay.addEventListener('click', () => { const video = document.querySelector('.ddpo-video'); video.currentTime = 0; video.play(); replay.style.display = 'none'; }); } Training Diffusion Models with Reinforcement Learning replay Diffusion models have recently emerged as the de facto standard for generating complex, high-dimensional outputs. You may know them for their ability to produce stunning AI art and hyper-realistic synthetic images, but they have also found success in oth…  ( 7 min )
  • Open

    Prospective Learning: Principled Extrapolation to the Future. (arXiv:2201.07372v2 [cs.LG] UPDATED)
    Learning is a process which can update decision rules, based on past experience, such that future performance improves. Traditionally, machine learning is often evaluated under the assumption that the future will be identical to the past in distribution or change adversarially. But these assumptions can be either too optimistic or pessimistic for many problems in the real world. Real world scenarios evolve over multiple spatiotemporal scales with partially predictable dynamics. Here we reformulate the learning problem to one that centers around this idea of dynamic futures that are partially learnable. We conjecture that certain sequences of tasks are not retrospectively learnable (in which the data distribution is fixed), but are prospectively learnable (in which distributions may be dynamic), suggesting that prospective learning is more difficult in kind than retrospective learning. We argue that prospective learning more accurately characterizes many real world problems that (1) currently stymie existing artificial intelligence solutions and/or (2) lack adequate explanations for how natural intelligences solve them. Thus, studying prospective learning will lead to deeper insights and solutions to currently vexing challenges in both natural and artificial intelligences.  ( 3 min )
    Provably Faster Gradient Descent via Long Steps. (arXiv:2307.06324v2 [math.OC] UPDATED)
    This work establishes provably faster convergence rates for gradient descent via a computer-assisted analysis technique. Our theory allows nonconstant stepsize policies with frequent long steps potentially violating descent by analyzing the overall effect of many iterations at once rather than the typical one-iteration inductions used in most first-order method analyses. We show that long steps, which may increase the objective value in the short term, lead to provably faster convergence in the long term. A conjecture towards proving a faster $O(1/T\log T)$ rate for gradient descent is also motivated along with simple numerical validation.
    Revisiting Discrete Soft Actor-Critic. (arXiv:2209.10081v3 [cs.LG] UPDATED)
    We study the adaption of soft actor-critic (SAC) from continuous action space to discrete action space. We revisit vanilla SAC and provide an in-depth understanding of its Q value underestimation and performance instability issues when applied to discrete settings. We thereby propose entropy-penalty and double average Q-learning with Q-clip to address these issues. Extensive experiments on typical benchmarks with discrete action space, including Atari games and a large-scale MOBA game, show the efficacy of our proposed method. Our code is at:https://github.com/coldsummerday/Revisiting-Discrete-SAC.  ( 2 min )
    Personalized Anomaly Detection in PPG Data using Representation Learning and Biometric Identification. (arXiv:2307.06380v1 [cs.LG])
    Photoplethysmography (PPG) signals, typically acquired from wearable devices, hold significant potential for continuous fitness-health monitoring. In particular, heart conditions that manifest in rare and subtle deviating heart patterns may be interesting. However, robust and reliable anomaly detection within these data remains a challenge due to the scarcity of labeled data and high inter-subject variability. This paper introduces a two-stage framework leveraging representation learning and personalization to improve anomaly detection performance in PPG data. The proposed framework first employs representation learning to transform the original PPG signals into a more discriminative and compact representation. We then apply three different unsupervised anomaly detection methods for movement detection and biometric identification. We validate our approach using two different datasets in both generalized and personalized scenarios. The results show that representation learning significantly improves anomaly detection performance while reducing the high inter-subject variability. Personalized models further enhance anomaly detection performance, underscoring the role of personalization in PPG-based fitness-health monitoring systems. The results from biometric identification show that it's easier to distinguish a new user from one intended authorized user than from a group of users. Overall, this study provides evidence of the effectiveness of representation learning and personalization for anomaly detection in PPG data.  ( 2 min )
    On the Connection between Game-Theoretic Feature Attributions and Counterfactual Explanations. (arXiv:2307.06941v1 [cs.AI])
    Explainable Artificial Intelligence (XAI) has received widespread interest in recent years, and two of the most popular types of explanations are feature attributions, and counterfactual explanations. These classes of approaches have been largely studied independently and the few attempts at reconciling them have been primarily empirical. This work establishes a clear theoretical connection between game-theoretic feature attributions, focusing on but not limited to SHAP, and counterfactuals explanations. After motivating operative changes to Shapley values based feature attributions and counterfactual explanations, we prove that, under conditions, they are in fact equivalent. We then extend the equivalency result to game-theoretic solution concepts beyond Shapley values. Moreover, through the analysis of the conditions of such equivalence, we shed light on the limitations of naively using counterfactual explanations to provide feature importances. Experiments on three datasets quantitatively show the difference in explanations at every stage of the connection between the two approaches and corroborate the theoretical findings.
    Control Transformer: Robot Navigation in Unknown Environments through PRM-Guided Return-Conditioned Sequence Modeling. (arXiv:2211.06407v3 [cs.RO] UPDATED)
    Learning long-horizon tasks such as navigation has presented difficult challenges for successfully applying reinforcement learning to robotics. From another perspective, under known environments, sampling-based planning can robustly find collision-free paths in environments without learning. In this work, we propose Control Transformer that models return-conditioned sequences from low-level policies guided by a sampling-based Probabilistic Roadmap (PRM) planner. We demonstrate that our framework can solve long-horizon navigation tasks using only local information. We evaluate our approach on partially-observed maze navigation with MuJoCo robots, including Ant, Point, and Humanoid. We show that Control Transformer can successfully navigate through mazes and transfer to unknown environments. Additionally, we apply our method to a differential drive robot (Turtlebot3) and show zero-shot sim2real transfer under noisy observations.
    Hybrid Control Policy for Artificial Pancreas via Ensemble Deep Reinforcement Learning. (arXiv:2307.06501v1 [cs.AI])
    Objective: The artificial pancreas (AP) has shown promising potential in achieving closed-loop glucose control for individuals with type 1 diabetes mellitus (T1DM). However, designing an effective control policy for the AP remains challenging due to the complex physiological processes, delayed insulin response, and inaccurate glucose measurements. While model predictive control (MPC) offers safety and stability through the dynamic model and safety constraints, it lacks individualization and is adversely affected by unannounced meals. Conversely, deep reinforcement learning (DRL) provides personalized and adaptive strategies but faces challenges with distribution shifts and substantial data requirements. Methods: We propose a hybrid control policy for the artificial pancreas (HyCPAP) to address the above challenges. HyCPAP combines an MPC policy with an ensemble DRL policy, leveraging the strengths of both policies while compensating for their respective limitations. To facilitate faster deployment of AP systems in real-world settings, we further incorporate meta-learning techniques into HyCPAP, leveraging previous experience and patient-shared knowledge to enable fast adaptation to new patients with limited available data. Results: We conduct extensive experiments using the FDA-accepted UVA/Padova T1DM simulator across three scenarios. Our approaches achieve the highest percentage of time spent in the desired euglycemic range and the lowest occurrences of hypoglycemia. Conclusion: The results clearly demonstrate the superiority of our methods for closed-loop glucose management in individuals with T1DM. Significance: The study presents novel control policies for AP systems, affirming the great potential of proposed methods for efficient closed-loop glucose control.  ( 3 min )
    Learning IMM Filter Parameters from Measurements using Gradient Descent. (arXiv:2307.06618v1 [cs.LG])
    The performance of data fusion and tracking algorithms often depends on parameters that not only describe the sensor system, but can also be task-specific. While for the sensor system tuning these variables is time-consuming and mostly requires expert knowledge, intrinsic parameters of targets under track can even be completely unobservable until the system is deployed. With state-of-the-art sensor systems growing more and more complex, the number of parameters naturally increases, necessitating the automatic optimization of the model variables. In this paper, the parameters of an interacting multiple model (IMM) filter are optimized solely using measurements, thus without necessity for any ground-truth data. The resulting method is evaluated through an ablation study on simulated data, where the trained model manages to match the performance of a filter parametrized with ground-truth values.
    Filling time-series gaps using image techniques: Multidimensional context autoencoder approach for building energy data imputation. (arXiv:2307.05926v2 [cs.LG] UPDATED)
    Building energy prediction and management has become increasingly important in recent decades, driven by the growth of Internet of Things (IoT) devices and the availability of more energy data. However, energy data is often collected from multiple sources and can be incomplete or inconsistent, which can hinder accurate predictions and management of energy systems and limit the usefulness of the data for decision-making and research. To address this issue, past studies have focused on imputing missing gaps in energy data, including random and continuous gaps. One of the main challenges in this area is the lack of validation on a benchmark dataset with various building and meter types, making it difficult to accurately evaluate the performance of different imputation methods. Another challenge is the lack of application of state-of-the-art imputation methods for missing gaps in energy data. Contemporary image-inpainting methods, such as Partial Convolution (PConv), have been widely used in the computer vision domain and have demonstrated their effectiveness in dealing with complex missing patterns. To study whether energy data imputation can benefit from the image-based deep learning method, this study compared PConv, Convolutional neural networks (CNNs), and weekly persistence method using one of the biggest publicly available whole building energy datasets, consisting of 1479 power meters worldwide, as the benchmark. The results show that, compared to the CNN with the raw time series (1D-CNN) and the weekly persistence method, neural network models with reshaped energy data with two dimensions reduced the Mean Squared Error (MSE) by 10% to 30%. The advanced deep learning method, Partial convolution (PConv), has further reduced the MSE by 20-30% than 2D-CNN and stands out among all models.
    Efficient SGD Neural Network Training via Sublinear Activated Neuron Identification. (arXiv:2307.06565v1 [cs.LG])
    Deep learning has been widely used in many fields, but the model training process usually consumes massive computational resources and time. Therefore, designing an efficient neural network training method with a provable convergence guarantee is a fundamental and important research question. In this paper, we present a static half-space report data structure that consists of a fully connected two-layer neural network for shifted ReLU activation to enable activated neuron identification in sublinear time via geometric search. We also prove that our algorithm can converge in $O(M^2/\epsilon^2)$ time with network size quadratic in the coefficient norm upper bound $M$ and error term $\epsilon$.
    Balanced Coarsening for Multilevel Hypergraph Partitioning via Wasserstein Discrepancy. (arXiv:2106.07501v2 [cs.LG] UPDATED)
    We propose a balanced coarsening scheme for multilevel hypergraph partitioning. In addition, an initial partitioning algorithm is designed to improve the quality of k-way hypergraph partitioning. By assigning vertex weights through the LPT algorithm, we generate a prior hypergraph under a relaxed balance constraint. With the prior hypergraph, we have defined the Wasserstein discrepancy to coordinate the optimal transport of coarsening process. And the optimal transport matrix is solved by Sinkhorn algorithm. Our coarsening scheme fully takes into account the minimization of connectivity metric (objective function). For the initial partitioning stage, we define a normalized cut function induced by Fiedler vector, which is theoretically proved to be a concave function. Thereby, a three-point algorithm is designed to find the best cut under the balance constraint.
    EfficientNet Algorithm for Classification of Different Types of Cancer. (arXiv:2304.08715v3 [eess.IV] UPDATED)
    Accurate and efficient classification of different types of cancer is critical for early detection and effective treatment. In this paper, we present the results of our experiments using the EfficientNet algorithm for classification of brain tumor, breast cancer mammography, chest cancer, and skin cancer. We used publicly available datasets and preprocessed the images to ensure consistency and comparability. Our experiments show that the EfficientNet algorithm achieved high accuracy, precision, recall, and F1 scores on each of the cancer datasets, outperforming other state-of-the-art algorithms in the literature. We also discuss the strengths and weaknesses of the EfficientNet algorithm and its potential applications in clinical practice. Our results suggest that the EfficientNet algorithm is well-suited for classification of different types of cancer and can be used to improve the accuracy and efficiency of cancer diagnosis.
    Differentially Private Synthetic Data Generation via Lipschitz-Regularised Variational Autoencoders. (arXiv:2304.11336v2 [cs.LG] UPDATED)
    Synthetic data has been hailed as the silver bullet for privacy preserving data analysis. If a record is not real, then how could it violate a person's privacy? In addition, deep-learning based generative models are employed successfully to approximate complex high-dimensional distributions from data and draw realistic samples from this learned distribution. It is often overlooked though that generative models are prone to memorising many details of individual training records and often generate synthetic data that too closely resembles the underlying sensitive training data, hence violating strong privacy regulations as, e.g., encountered in health care. Differential privacy is the well-known state-of-the-art framework for guaranteeing protection of sensitive individuals' data, allowing aggregate statistics and even machine learning models to be released publicly without compromising privacy. The training mechanisms however often add too much noise during the training process, and thus severely compromise the utility of these private models. Even worse, the tight privacy budgets do not allow for many training epochs so that model quality cannot be properly controlled in practice. In this paper we explore an alternative approach for privately generating data that makes direct use of the inherent stochasticity in generative models, e.g., variational autoencoders. The main idea is to appropriately constrain the continuity modulus of the deep models instead of adding another noise mechanism on top. For this approach, we derive mathematically rigorous privacy guarantees and illustrate its effectiveness with practical experiments.
    Large Language Models for Supply Chain Optimization. (arXiv:2307.03875v2 [cs.AI] UPDATED)
    Supply chain operations traditionally involve a variety of complex decision making problems. Over the last few decades, supply chains greatly benefited from advances in computation, which allowed the transition from manual processing to automation and cost-effective optimization. Nonetheless, business operators still need to spend substantial efforts in explaining and interpreting the optimization outcomes to stakeholders. Motivated by the recent advances in Large Language Models (LLMs), we study how this disruptive technology can help bridge the gap between supply chain automation and human comprehension and trust thereof. We design OptiGuide -- a framework that accepts as input queries in plain text, and outputs insights about the underlying optimization outcomes. Our framework does not forgo the state-of-the-art combinatorial optimization technology, but rather leverages it to quantitatively answer what-if scenarios (e.g., how would the cost change if we used supplier B instead of supplier A for a given demand?). Importantly, our design does not require sending proprietary data over to LLMs, which can be a privacy concern in some circumstances. We demonstrate the effectiveness of our framework on a real server placement scenario within Microsoft's cloud supply chain. Along the way, we develop a general evaluation benchmark, which can be used to evaluate the accuracy of the LLM output in other scenarios.
    Data Augmentation in Training CNNs: Injecting Noise to Images. (arXiv:2307.06855v1 [cs.CV])
    Noise injection is a fundamental tool for data augmentation, and yet there is no widely accepted procedure to incorporate it with learning frameworks. This study analyzes the effects of adding or applying different noise models of varying magnitudes to Convolutional Neural Network (CNN) architectures. Noise models that are distributed with different density functions are given common magnitude levels via Structural Similarity (SSIM) metric in order to create an appropriate ground for comparison. The basic results are conforming with the most of the common notions in machine learning, and also introduce some novel heuristics and recommendations on noise injection. The new approaches will provide better understanding on optimal learning procedures for image classification.
    Towards Safe Autonomous Driving Policies using a Neuro-Symbolic Deep Reinforcement Learning Approach. (arXiv:2307.01316v2 [cs.RO] UPDATED)
    The dynamic nature of driving environments and the presence of diverse road users pose significant challenges for decision-making in autonomous driving. Deep reinforcement learning (DRL) has emerged as a popular approach to tackle this problem. However, the application of existing DRL solutions is mainly confined to simulated environments due to safety concerns, impeding their deployment in real-world. To overcome this limitation, this paper introduces a novel neuro-symbolic model-free DRL approach, called DRL with Symbolic Logics (DRLSL) that combines the strengths of DRL (learning from experience) and symbolic first-order logics (knowledge-driven reasoning) to enable safe learning in real-time interactions of autonomous driving within real environments. This innovative approach provides a means to learn autonomous driving policies by actively engaging with the physical environment while ensuring safety. We have implemented the DRLSL framework in autonomous driving using the highD dataset and demonstrated that our method successfully avoids unsafe actions during both the training and testing phases. Furthermore, our results indicate that DRLSL achieves faster convergence during training and exhibits better generalizability to new driving scenarios compared to traditional DRL methods.
    PIGEON: Predicting Image Geolocations. (arXiv:2307.05845v2 [cs.CV] UPDATED)
    We introduce PIGEON, a multi-task end-to-end system for planet-scale image geolocalization that achieves state-of-the-art performance on both external benchmarks and in human evaluation. Our work incorporates semantic geocell creation with label smoothing, conducts pretraining of a vision transformer on images with geographic information, and refines location predictions with ProtoNets across a candidate set of geocells. The contributions of PIGEON are three-fold: first, we design a semantic geocells creation and splitting algorithm based on open-source data which can be adapted to any geospatial dataset. Second, we show the effectiveness of intra-geocell refinement and the applicability of unsupervised clustering and ProtNets to the task. Finally, we make our pre-trained CLIP transformer model, StreetCLIP, publicly available for use in adjacent domains with applications to fighting climate change and urban and rural scene understanding.
    Differentially Private Decoupled Graph Convolutions for Multigranular Topology Protection. (arXiv:2307.06422v1 [cs.LG])
    Graph learning methods, such as Graph Neural Networks (GNNs) based on graph convolutions, are highly successful in solving real-world learning problems involving graph-structured data. However, graph learning methods expose sensitive user information and interactions not only through their model parameters but also through their model predictions. Consequently, standard Differential Privacy (DP) techniques that merely offer model weight privacy are inadequate. This is especially the case for node predictions that leverage neighboring node attributes directly via graph convolutions that create additional risks of privacy leakage. To address this problem, we introduce Graph Differential Privacy (GDP), a new formal DP framework tailored to graph learning settings that ensures both provably private model parameters and predictions. Furthermore, since there may be different privacy requirements for the node attributes and graph structure, we introduce a novel notion of relaxed node-level data adjacency. This relaxation can be used for establishing guarantees for different degrees of graph topology privacy while maintaining node attribute privacy. Importantly, this relaxation reveals a useful trade-off between utility and topology privacy for graph learning methods. In addition, our analysis of GDP reveals that existing DP-GNNs fail to exploit this trade-off due to the complex interplay between graph topology and attribute data in standard graph convolution designs. To mitigate this problem, we introduce the Differentially Private Decoupled Graph Convolution (DPDGC) model, which benefits from decoupled graph convolution while providing GDP guarantees. Extensive experiments on seven node classification benchmarking datasets demonstrate the superior privacy-utility trade-off of DPDGC over existing DP-GNNs based on standard graph convolution design.  ( 3 min )
    A Causal Framework to Unify Common Domain Generalization Approaches. (arXiv:2307.06825v1 [cs.LG])
    Domain generalization (DG) is about learning models that generalize well to new domains that are related to, but different from, the training domain(s). It is a fundamental problem in machine learning and has attracted much attention in recent years. A large number of approaches have been proposed. Different approaches are motivated from different perspectives, making it difficult to gain an overall understanding of the area. In this paper, we propose a causal framework for domain generalization and present an understanding of common DG approaches in the framework. Our work sheds new lights on the following questions: (1) What are the key ideas behind each DG method? (2) Why is it expected to improve generalization to new domains theoretically? (3) How are different DG methods related to each other and what are relative advantages and limitations? By providing a unified perspective on DG, we hope to help researchers better understand the underlying principles and develop more effective approaches for this critical problem in machine learning.
    Full-resolution Lung Nodule Segmentation from Chest X-ray Images using Residual Encoder-Decoder Networks. (arXiv:2307.06547v1 [eess.IV])
    Lung cancer is the leading cause of cancer death and early diagnosis is associated with a positive prognosis. Chest X-ray (CXR) provides an inexpensive imaging mode for lung cancer diagnosis. Suspicious nodules are difficult to distinguish from vascular and bone structures using CXR. Computer vision has previously been proposed to assist human radiologists in this task, however, leading studies use down-sampled images and computationally expensive methods with unproven generalization. Instead, this study localizes lung nodules using efficient encoder-decoder neural networks that process full resolution images to avoid any signal loss resulting from down-sampling. Encoder-decoder networks are trained and tested using the JSRT lung nodule dataset. The networks are used to localize lung nodules from an independent external CXR dataset. Sensitivity and false positive rates are measured using an automated framework to eliminate any observer subjectivity. These experiments allow for the determination of the optimal network depth, image resolution and pre-processing pipeline for generalized lung nodule localization. We find that nodule localization is influenced by subtlety, with more subtle nodules being detected in earlier training epochs. Therefore, we propose a novel self-ensemble model from three consecutive epochs centered on the validation optimum. This ensemble achieved a sensitivity of 85% in 10-fold internal testing with false positives of 8 per image. A sensitivity of 81% is achieved at a false positive rate of 6 following morphological false positive reduction. This result is comparable to more computationally complex systems based on linear and spatial filtering, but with a sub-second inference time that is faster than other methods. The proposed algorithm achieved excellent generalization results against an external dataset with sensitivity of 77% at a false positive rate of 7.6.  ( 3 min )
    Parallel bootstrap-based on-policy deep reinforcement learning for continuous flow control applications. (arXiv:2304.12330v3 [cs.LG] UPDATED)
    The coupling of deep reinforcement learning to numerical flow control problems has recently received a considerable attention, leading to groundbreaking results and opening new perspectives for the domain. Due to the usually high computational cost of fluid dynamics solvers, the use of parallel environments during the learning process represents an essential ingredient to attain efficient control in a reasonable time. Yet, most of the deep reinforcement learning literature for flow control relies on on-policy algorithms, for which the massively parallel transition collection may break theoretical assumptions and lead to suboptimal control models. To overcome this issue, we propose a parallelism pattern relying on partial-trajectory buffers terminated by a return bootstrapping step, allowing a flexible use of parallel environments while preserving the on-policiness of the updates. This approach is illustrated on a CPU-intensive continuous flow control problem from the literature.  ( 2 min )
    On Collaboration in Distributed Parameter Estimation with Resource Constraints. (arXiv:2307.06442v1 [cs.LG])
    We study sensor/agent data collection and collaboration policies for parameter estimation, accounting for resource constraints and correlation between observations collected by distinct sensors/agents. Specifically, we consider a group of sensors/agents each samples from different variables of a multivariate Gaussian distribution and has different estimation objectives, and we formulate a sensor/agent's data collection and collaboration policy design problem as a Fisher information maximization (or Cramer-Rao bound minimization) problem. When the knowledge of correlation between variables is available, we analytically identify two particular scenarios: (1) where the knowledge of the correlation between samples cannot be leveraged for collaborative estimation purposes and (2) where the optimal data collection policy involves investing scarce resources to collaboratively sample and transfer information that is not of immediate interest and whose statistics are already known, with the sole goal of increasing the confidence on the estimate of the parameter of interest. When the knowledge of certain correlation is unavailable but collaboration may still be worthwhile, we propose novel ways to apply multi-armed bandit algorithms to learn the optimal data collection and collaboration policy in our distributed parameter estimation problem and demonstrate that the proposed algorithms, DOUBLE-F, DOUBLE-Z, UCB-F, UCB-Z, are effective through simulations.
    Efficient Task Offloading Algorithm for Digital Twin in Edge/Cloud Computing Environment. (arXiv:2307.05888v2 [cs.LG] UPDATED)
    In the era of Internet of Things (IoT), Digital Twin (DT) is envisioned to empower various areas as a bridge between physical objects and the digital world. Through virtualization and simulation techniques, multiple functions can be achieved by leveraging computing resources. In this process, Mobile Cloud Computing (MCC) and Mobile Edge Computing (MEC) have become two of the key factors to achieve real-time feedback. However, current works only considered edge servers or cloud servers in the DT system models. Besides, The models ignore the DT with not only one data resource. In this paper, we propose a new DT system model considering a heterogeneous MEC/MCC environment. Each DT in the model is maintained in one of the servers via multiple data collection devices. The offloading decision-making problem is also considered and a new offloading scheme is proposed based on Distributed Deep Learning (DDL). Simulation results demonstrate that our proposed algorithm can effectively and efficiently decrease the system's average latency and energy consumption. Significant improvement is achieved compared with the baselines under the dynamic environment of DTs.
    Aeolus Ocean -- A simulation environment for the autonomous COLREG-compliant navigation of Unmanned Surface Vehicles using Deep Reinforcement Learning and Maritime Object Detection. (arXiv:2307.06688v1 [cs.RO])
    Heading towards navigational autonomy in unmanned surface vehicles (USVs) in the maritime sector can fundamentally lead towards safer waters as well as reduced operating costs, while also providing a range of exciting new capabilities for oceanic research, exploration and monitoring. However, achieving such a goal is challenging. USV control systems must, safely and reliably, be able to adhere to the international regulations for preventing collisions at sea (COLREGs) in encounters with other vessels as they navigate to a given waypoint while being affected by realistic weather conditions, either during the day or at night. To deal with the multitude of possible scenarios, it is critical to have a virtual environment that is able to replicate the realistic operating conditions USVs will encounter, before they can be implemented in the real world. Such "digital twins" form the foundations upon which Deep Reinforcement Learning (DRL) and Computer Vision (CV) algorithms can be used to develop and guide USV control systems. In this paper we describe the novel development of a COLREG-compliant DRL-based collision avoidant navigational system with CV-based awareness in a realistic ocean simulation environment. The performance of the trained autonomous Agents resulting from this approach is evaluated in several successful navigations to set waypoints in both open sea and coastal encounters with other vessels. A binary executable version of the simulator with trained agents is available at https://github.com/aavek/Aeolus-Ocean
    RulE: Neural-Symbolic Knowledge Graph Reasoning with Rule Embedding. (arXiv:2210.14905v2 [cs.AI] UPDATED)
    Knowledge graph (KG) reasoning is an important problem for knowledge graphs. In this paper, we propose a novel and principled framework called \textbf{RulE} (stands for {Rul}e {E}mbedding) to effectively leverage logical rules to enhance KG reasoning. Unlike knowledge graph embedding (KGE) methods, RulE learns rule embeddings from existing triplets and first-order {rules} by jointly representing \textbf{entities}, \textbf{relations} and \textbf{logical rules} in a unified embedding space. Based on the learned rule embeddings, a confidence score can be calculated for each rule, reflecting its consistency with the observed triplets. This allows us to perform logical rule inference in a soft way, thus alleviating the brittleness of logic. On the other hand, RulE injects prior logical rule information into the embedding space, enriching and regularizing the entity/relation embeddings. This makes KGE alone perform better too. RulE is conceptually simple and empirically effective. We conduct extensive experiments to verify each component of RulE. Results on multiple benchmarks reveal that our model outperforms the majority of existing embedding-based and rule-based approaches.
    EPiC-GAN: Equivariant Point Cloud Generation for Particle Jets. (arXiv:2301.08128v3 [hep-ph] UPDATED)
    With the vast data-collecting capabilities of current and future high-energy collider experiments, there is an increasing demand for computationally efficient simulations. Generative machine learning models enable fast event generation, yet so far these approaches are largely constrained to fixed data structures and rigid detector geometries. In this paper, we introduce EPiC-GAN - equivariant point cloud generative adversarial network - which can produce point clouds of variable multiplicity. This flexible framework is based on deep sets and is well suited for simulating sprays of particles called jets. The generator and discriminator utilize multiple EPiC layers with an interpretable global latent vector. Crucially, the EPiC layers do not rely on pairwise information sharing between particles, which leads to a significant speed-up over graph- and transformer-based approaches with more complex relation diagrams. We demonstrate that EPiC-GAN scales well to large particle multiplicities and achieves high generation fidelity on benchmark jet generation tasks.  ( 2 min )
    Improving and generalizing flow-based generative models with minibatch optimal transport. (arXiv:2302.00482v2 [cs.LG] UPDATED)
    Continuous normalizing flows (CNFs) are an attractive generative modeling technique, but they have been held back by limitations in their simulation-based maximum likelihood training. We introduce the generalized conditional flow matching (CFM) technique, a family of simulation-free training objectives for CNFs. CFM features a stable regression objective like that used to train the stochastic flow in diffusion models but enjoys the efficient inference of deterministic flow models. In contrast to both diffusion models and prior CNF training algorithms, CFM does not require the source distribution to be Gaussian or require evaluation of its density. A variant of our objective is optimal transport CFM (OT-CFM), which creates simpler flows that are more stable to train and lead to faster inference, as evaluated in our experiments. Furthermore, OT-CFM is the first method to compute dynamic OT in a simulation-free way. Training CNFs with CFM improves results on a variety of conditional and unconditional generation tasks, such as inferring single cell dynamics, unsupervised image translation, and Schr\"odinger bridge inference.  ( 2 min )
    A New Formalism, Method and Open Issues for Zero-Shot Coordination. (arXiv:2106.06613v3 [cs.AI] UPDATED)
    In many coordination problems, independently reasoning humans are able to discover mutually compatible policies. In contrast, independently trained self-play policies are often mutually incompatible. Zero-shot coordination (ZSC) has recently been proposed as a new frontier in multi-agent reinforcement learning to address this fundamental issue. Prior work approaches the ZSC problem by assuming players can agree on a shared learning algorithm but not on labels for actions and observations, and proposes other-play as an optimal solution. However, until now, this "label-free" problem has only been informally defined. We formalize this setting as the label-free coordination (LFC) problem by defining the label-free coordination game. We show that other-play is not an optimal solution to the LFC problem as it fails to consistently break ties between incompatible maximizers of the other-play objective. We introduce an extension of the algorithm, other-play with tie-breaking, and prove that it is optimal in the LFC problem and an equilibrium in the LFC game. Since arbitrary tie-breaking is precisely what the ZSC setting aims to prevent, we conclude that the LFC problem does not reflect the aims of ZSC. To address this, we introduce an alternative informal operationalization of ZSC as a starting point for future work.  ( 2 min )
    TRUST-LAPSE: An Explainable and Actionable Mistrust Scoring Framework for Model Monitoring. (arXiv:2207.11290v2 [cs.LG] UPDATED)
    Continuous monitoring of trained ML models to determine when their predictions should and should not be trusted is essential for their safe deployment. Such a framework ought to be high-performing, explainable, post-hoc and actionable. We propose TRUST-LAPSE, a "mistrust" scoring framework for continuous model monitoring. We assess the trustworthiness of each input sample's model prediction using a sequence of latent-space embeddings. Specifically, (a) our latent-space mistrust score estimates mistrust using distance metrics (Mahalanobis distance) and similarity metrics (cosine similarity) in the latent-space and (b) our sequential mistrust score determines deviations in correlations over the sequence of past input representations in a non-parametric, sliding-window based algorithm for actionable continuous monitoring. We evaluate TRUST-LAPSE via two downstream tasks: (1) distributionally shifted input detection, and (2) data drift detection. We evaluate across diverse domains - audio and vision using public datasets and further benchmark our approach on challenging, real-world electroencephalograms (EEG) datasets for seizure detection. Our latent-space mistrust scores achieve state-of-the-art results with AUROCs of 84.1 (vision), 73.9 (audio), and 77.1 (clinical EEGs), outperforming baselines by over 10 points. We expose critical failures in popular baselines that remain insensitive to input semantic content, rendering them unfit for real-world model monitoring. We show that our sequential mistrust scores achieve high drift detection rates; over 90% of the streams show < 20% error for all domains. Through extensive qualitative and quantitative evaluations, we show that our mistrust scores are more robust and provide explainability for easy adoption into practice.  ( 3 min )
    A Survey on Transformers in Reinforcement Learning. (arXiv:2301.03044v2 [cs.LG] UPDATED)
    Transformer has been considered the dominating neural architecture in NLP and CV, mostly under supervised settings. Recently, a similar surge of using Transformers has appeared in the domain of reinforcement learning (RL), but it is faced with unique design choices and challenges brought by the nature of RL. However, the evolution of Transformers in RL has not yet been well unraveled. In this paper, we seek to systematically review motivations and progress on using Transformers in RL, provide a taxonomy on existing works, discuss each sub-field, and summarize future prospects.
    Hiding in Plain Sight: Differential Privacy Noise Exploitation for Evasion-resilient Localized Poisoning Attacks in Multiagent Reinforcement Learning. (arXiv:2307.00268v2 [cs.LG] UPDATED)
    Lately, differential privacy (DP) has been introduced in cooperative multiagent reinforcement learning (CMARL) to safeguard the agents' privacy against adversarial inference during knowledge sharing. Nevertheless, we argue that the noise introduced by DP mechanisms may inadvertently give rise to a novel poisoning threat, specifically in the context of private knowledge sharing during CMARL, which remains unexplored in the literature. To address this shortcoming, we present an adaptive, privacy-exploiting, and evasion-resilient localized poisoning attack (PeLPA) that capitalizes on the inherent DP-noise to circumvent anomaly detection systems and hinder the optimal convergence of the CMARL model. We rigorously evaluate our proposed PeLPA attack in diverse environments, encompassing both non-adversarial and multiple-adversarial contexts. Our findings reveal that, in a medium-scale environment, the PeLPA attack with attacker ratios of 20% and 40% can lead to an increase in average steps to goal by 50.69% and 64.41%, respectively. Furthermore, under similar conditions, PeLPA can result in a 1.4x and 1.6x computational time increase in optimal reward attainment and a 1.18x and 1.38x slower convergence for attacker ratios of 20% and 40%, respectively.
    Human Biophysics as Network Weights: Conditional Generative Models for Dynamic Simulation. (arXiv:2211.01856v3 [cs.LG] UPDATED)
    Simulations of biophysical systems are fundamental for studying physiological mechanisms and developing human machine interfaces. Whilst advanced numerical methods, such as finite element models, can excel in this task, they are extremely computationally expensive to use when generating a large number of simulations or simulating dynamic events with continuously changing structural parameters. We propose an architecture that uses a conditional generative model to interpolate between the numerical model states, dramatically lowering the modeling time while maintaining a high generation accuracy. As a demonstration of this concept, we present BioMime, a hybrid-structured generative model that enables an accurate, ultra-fast, and arbitrarily high temporal-resolution simulation of a specific biophysical system during dynamic changes. This methodology has wide applications in physiological and clinical research as well as in supporting data augmentation strategies for signal analysis, representing a computationally efficient and highly accurate model for biophysical simulations.
    Robust online active learning. (arXiv:2302.00422v5 [stat.ML] UPDATED)
    In many industrial applications, obtaining labeled observations is not straightforward as it often requires the intervention of human experts or the use of expensive testing equipment. In these circumstances, active learning can be highly beneficial in suggesting the most informative data points to be used when fitting a model. Reducing the number of observations needed for model development alleviates both the computational burden required for training and the operational expenses related to labeling. Online active learning, in particular, is useful in high-volume production processes where the decision about the acquisition of the label for a data point needs to be taken within an extremely short time frame. However, despite the recent efforts to develop online active learning strategies, the behavior of these methods in the presence of outliers has not been thoroughly examined. In this work, we investigate the performance of online active linear regression in contaminated data streams. Our study shows that the currently available query strategies are prone to sample outliers, whose inclusion in the training set eventually degrades the predictive performance of the models. To address this issue, we propose a solution that bounds the search area of a conditional D-optimal algorithm and uses a robust estimator. Our approach strikes a balance between exploring unseen regions of the input space and protecting against outliers. Through numerical simulations, we show that the proposed method is effective in improving the performance of online active learning in the presence of outliers, thus expanding the potential applications of this powerful tool.
    Learning Graph ARMA Processes from Time-Vertex Spectra. (arXiv:2302.06887v2 [stat.ML] UPDATED)
    The modeling of time-varying graph signals as stationary time-vertex stochastic processes permits the inference of missing signal values by efficiently employing the correlation patterns of the process across different graph nodes and time instants. In this study, we propose an algorithm for computing graph autoregressive moving average (graph ARMA) processes based on learning the joint time-vertex power spectral density of the process from its incomplete realizations for the task of signal interpolation. Our solution relies on first roughly estimating the joint spectrum of the process from partially observed realizations and then refining this estimate by projecting it onto the spectrum manifold of the graph ARMA process through convex relaxations. The initially missing signal values are then estimated based on the learnt model. Experimental results show that the proposed approach achieves high accuracy in time-vertex signal estimation problems.
    Personalization Disentanglement for Federated Learning: An explainable perspective. (arXiv:2306.03570v2 [cs.LG] UPDATED)
    Personalized federated learning (PFL) jointly trains a variety of local models through balancing between knowledge sharing across clients and model personalization per client. This paper addresses PFL via explicit disentangling latent representations into two parts to capture the shared knowledge and client-specific personalization, which leads to more reliable and effective PFL. The disentanglement is achieved by a novel Federated Dual Variational Autoencoder (FedDVA), which employs two encoders to infer the two types of representations. FedDVA can produce a better understanding of the trade-off between global knowledge sharing and local personalization in PFL. Moreover, it can be integrated with existing FL methods and turn them into personalized models for heterogeneous downstream tasks. Extensive experiments validate the advantages caused by disentanglement and show that models trained with disentangled representations substantially outperform those vanilla methods.
    Revisiting Generalized p-Laplacian Regularized Framelet GCNs: Convergence, Energy Dynamic and Training with Non-Linear Diffusion. (arXiv:2305.15639v3 [cs.LG] UPDATED)
    This paper presents a comprehensive theoretical analysis of the graph p-Laplacian regularized framelet network (pL-UFG) to establish a solid understanding of its properties. We conduct a convergence analysis on pL-UFG, addressing the gap in the understanding of its asymptotic behaviors. Further by investigating the generalized Dirichlet energy of pL-UFG, we demonstrate that the Dirichlet energy remains non-zero throughout convergence, ensuring the avoidance of over-smoothing issues. Additionally, we elucidate the energy dynamic perspective, highlighting the synergistic relationship between the implicit layer in pL-UFG and graph framelets. This synergy enhances the model's adaptability to both homophilic and heterophilic data. Notably, we reveal that pL-UFG can be interpreted as a generalized non-linear diffusion process, thereby bridging the gap between pL-UFG and differential equations on the graph. Importantly, these multifaceted analyses lead to unified conclusions that offer novel insights for understanding and implementing pL-UFG, as well as other graph neural network (GNN) models. Finally, based on our dynamic analysis, we propose two novel pL-UFG models with manually controlled energy dynamics. We demonstrate empirically and theoretically that our proposed models not only inherit the advantages of pL-UFG but also significantly reduce computational costs for training on large-scale graph datasets.
    Adversarial Policies Beat Superhuman Go AIs. (arXiv:2211.00241v4 [cs.LG] UPDATED)
    We attack the state-of-the-art Go-playing AI system KataGo by training adversarial policies against it, achieving a >97% win rate against KataGo running at superhuman settings. Our adversaries do not win by playing Go well. Instead, they trick KataGo into making serious blunders. Our attack transfers zero-shot to other superhuman Go-playing AIs, and is comprehensible to the extent that human experts can implement it without algorithmic assistance to consistently beat superhuman AIs. The core vulnerability uncovered by our attack persists even in KataGo agents adversarially trained to defend against our attack. Our results demonstrate that even superhuman AI systems may harbor surprising failure modes. Example games are available https://goattack.far.ai/.
    A Deep Learning Method for Comparing Bayesian Hierarchical Models. (arXiv:2301.11873v3 [stat.ML] UPDATED)
    Bayesian model comparison (BMC) offers a principled approach for assessing the relative merits of competing computational models and propagating uncertainty into model selection decisions. However, BMC is often intractable for the popular class of hierarchical models due to their high-dimensional nested parameter structure. To address this intractability, we propose a deep learning method for performing BMC on any set of hierarchical models which can be instantiated as probabilistic programs. Since our method enables amortized inference, it allows efficient re-estimation of posterior model probabilities and fast performance validation prior to any real-data application. In a series of extensive validation studies, we benchmark the performance of our method against the state-of-the-art bridge sampling method and demonstrate excellent amortized inference across all BMC settings. We then showcase our method by comparing four hierarchical evidence accumulation models that have previously been deemed intractable for BMC due to partly implicit likelihoods. In this application, we corroborate evidence for the recently proposed L\'evy flight model of decision-making and show how transfer learning can be leveraged to enhance training efficiency. We provide reproducible code for all analyses and an open-source implementation of our method.
    A kernel Stein test of goodness of fit for sequential models. (arXiv:2210.10741v3 [stat.ML] UPDATED)
    We propose a goodness-of-fit measure for probability densities modeling observations with varying dimensionality, such as text documents of differing lengths or variable-length sequences. The proposed measure is an instance of the kernel Stein discrepancy (KSD), which has been used to construct goodness-of-fit tests for unnormalized densities. The KSD is defined by its Stein operator: current operators used in testing apply to fixed-dimensional spaces. As our main contribution, we extend the KSD to the variable-dimension setting by identifying appropriate Stein operators, and propose a novel KSD goodness-of-fit test. As with the previous variants, the proposed KSD does not require the density to be normalized, allowing the evaluation of a large class of models. Our test is shown to perform well in practice on discrete sequential data benchmarks.
    Beyond the Snapshot: Brain Tokenized Graph Transformer for Longitudinal Brain Functional Connectome Embedding. (arXiv:2307.00858v2 [q-bio.NC] UPDATED)
    Under the framework of network-based neurodegeneration, brain functional connectome (FC)-based Graph Neural Networks (GNN) have emerged as a valuable tool for the diagnosis and prognosis of neurodegenerative diseases such as Alzheimer's disease (AD). However, these models are tailored for brain FC at a single time point instead of characterizing FC trajectory. Discerning how FC evolves with disease progression, particularly at the predementia stages such as cognitively normal individuals with amyloid deposition or individuals with mild cognitive impairment (MCI), is crucial for delineating disease spreading patterns and developing effective strategies to slow down or even halt disease advancement. In this work, we proposed the first interpretable framework for brain FC trajectory embedding with application to neurodegenerative disease diagnosis and prognosis, namely Brain Tokenized Graph Transformer (Brain TokenGT). It consists of two modules: 1) Graph Invariant and Variant Embedding (GIVE) for generation of node and spatio-temporal edge embeddings, which were tokenized for downstream processing; 2) Brain Informed Graph Transformer Readout (BIGTR) which augments previous tokens with trainable type identifiers and non-trainable node identifiers and feeds them into a standard transformer encoder to readout. We conducted extensive experiments on two public longitudinal fMRI datasets of the AD continuum for three tasks, including differentiating MCI from controls, predicting dementia conversion in MCI, and classification of amyloid positive or negative cognitively normal individuals. Based on brain FC trajectory, the proposed Brain TokenGT approach outperformed all the other benchmark models and at the same time provided excellent interpretability. The code is available at https://github.com/ZijianD/Brain-TokenGT.git
    Improving Small Language Models on PubMedQA via Generative Data Augmentation. (arXiv:2305.07804v3 [cs.CL] UPDATED)
    Large Language Models (LLMs) have made remarkable advancements in the field of natural language processing. However, their increasing size poses challenges in terms of computational cost. On the other hand, Small Language Models (SLMs) are known for their efficiency, but they often struggle with limited capacity and training data, especially in specific domains. In this paper, we introduce a novel method aimed at improving SLMs in the medical domain using LLM-based generative data augmentation. The objective of our approach is to develop more efficient and capable models that are specifically tailored for specialized applications. Through experiments conducted on the PubMedQA dataset, we demonstrate the effectiveness of LLMs in refining and diversifying existing question-answer pairs. This refinement process leads to improved performance in a significantly smaller model after fine-tuning. Notably, our best SLM, with under 1.6 billion parameters, outperforms the few-shot GPT-4 on the PubMedQA dataset. Our code and generated data are publicly available to facilitate further explorations.
    In-context Autoencoder for Context Compression in a Large Language Model. (arXiv:2307.06945v1 [cs.CL])
    We propose the In-context Autoencoder (ICAE) for context compression in a large language model (LLM). The ICAE has two modules: a learnable encoder adapted with LoRA from an LLM for compressing a long context into a limited number of memory slots, and a fixed decoder which is the target LLM that can condition on the memory slots for various purposes. We first pretrain the ICAE using both autoencoding and language modeling objectives on massive text data, enabling it to generate memory slots that accurately and comprehensively represent the original context. Then, we fine-tune the pretrained ICAE on a small amount of instruct data to enhance its interaction with various prompts for producing desirable responses. Our experimental results demonstrate that the ICAE learned with our proposed pretraining and fine-tuning paradigm can effectively produce memory slots with $4\times$ context compression, which can be well conditioned on by the target LLM to respond to various prompts. The promising results demonstrate significant implications of the ICAE for its novel approach to the long context problem and its potential to reduce computation and memory overheads for LLM inference in practice, suggesting further research effort in context management for an LLM. Our code and data will be released shortly.
    SAN: Inducing Metrizability of GAN with Discriminative Normalized Linear Layer. (arXiv:2301.12811v2 [cs.LG] UPDATED)
    Generative adversarial networks (GANs) learn a target probability distribution by optimizing a generator and a discriminator with minimax objectives. This paper addresses the question of whether such optimization actually provides the generator with gradients that make its distribution close to the target distribution. We derive metrizable conditions, sufficient conditions for the discriminator to serve as the distance between the distributions by connecting the GAN formulation with the concept of sliced optimal transport. Furthermore, by leveraging these theoretical results, we propose a novel GAN training scheme, called slicing adversarial network (SAN). With only simple modifications, a broad class of existing GANs can be converted to SANs. Experiments on synthetic and image datasets support our theoretical results and the SAN's effectiveness as compared to usual GANs. Furthermore, we also apply SAN to StyleGAN-XL, which leads to state-of-the-art FID score amongst GANs for class conditional generation on ImageNet 256$\times$256.
    Local Intrinsic Dimensionality Measures for Graphs, with Applications to Graph Embeddings. (arXiv:2208.11986v2 [cs.LG] UPDATED)
    The notion of local intrinsic dimensionality (LID) is an important advancement in data dimensionality analysis, with applications in data mining, machine learning and similarity search problems. Existing distance-based LID estimators were designed for tabular datasets encompassing data points represented as vectors in a Euclidean space. After discussing their limitations for graph-structured data considering graph embeddings and graph distances, we propose NC-LID, a novel LID-related measure for quantifying the discriminatory power of the shortest-path distance with respect to natural communities of nodes as their intrinsic localities. It is shown how this measure can be used to design LID-aware graph embedding algorithms by formulating two LID-elastic variants of node2vec with personalized hyperparameters that are adjusted according to NC-LID values. Our empirical analysis of NC-LID on a large number of real-world graphs shows that this measure is able to point to nodes with high link reconstruction errors in node2vec embeddings better than node centrality metrics. The experimental evaluation also shows that the proposed LID-elastic node2vec extensions improve node2vec by better preserving graph structure in generated embeddings.
    Climate-Invariant Machine Learning. (arXiv:2112.08440v2 [cs.LG] UPDATED)
    Projecting climate change is a generalization problem: we extrapolate the recent past using physical models across past, present, and future climates. Current climate models require representations of processes that occur at scales smaller than model grid size, which have been the main source of model projection uncertainty. Recent machine learning (ML) algorithms hold promise to improve such process representations, but tend to extrapolate poorly to climate regimes they were not trained on. To get the best of the physical and statistical worlds, we propose a new framework -- termed "climate-invariant" ML -- incorporating knowledge of climate processes into ML algorithms, and show that it can maintain high accuracy across a wide range of climate and geographic conditions in three distinct atmospheric models. Our results suggest that explicitly incorporating physical knowledge into data-driven models of Earth system processes can improve their consistency, data efficiency, and generalizability across climate regimes.
    Tensor Completion via Leverage Sampling and Tensor QR Decomposition for Network Latency Estimation. (arXiv:2307.06848v1 [cs.NI])
    In this paper, we consider the network latency estimation, which has been an important metric for network performance. However, a large scale of network latency estimation requires a lot of computing time. Therefore, we propose a new method that is much faster and maintains high accuracy. The data structure of network nodes can form a matrix, and the tensor model can be formed by introducing the time dimension. Thus, the entire problem can be be summarized as a tensor completion problem. The main idea of our method is improving the tensor leverage sampling strategy and introduce tensor QR decomposition into tensor completion. To achieve faster tensor leverage sampling, we replace tensor singular decomposition (t-SVD) with tensor CSVD-QR to appoximate t-SVD. To achieve faster completion for incomplete tensor, we use the tensor $L_{2,1}$-norm rather than traditional tensor nuclear norm. Furthermore, we introduce tensor QR decomposition into alternating direction method of multipliers (ADMM) framework. Numerical experiments witness that our method is faster than state-of-art algorithms with satisfactory accuracy.
    Joint User and Data Detection in Grant-Free NOMA with Attention-based BiLSTM Network. (arXiv:2209.06392v2 [eess.SP] UPDATED)
    We consider the multi-user detection (MUD) problem in uplink grant-free non-orthogonal multiple access (NOMA), where the access point has to identify the total number and correct identity of the active Internet of Things (IoT) devices and decode their transmitted data. We assume that IoT devices use complex spreading sequences and transmit information in a random-access manner following the burst-sparsity model, where some IoT devices transmit their data in multiple adjacent time slots with a high probability, while others transmit only once during a frame. Exploiting the temporal correlation, we propose an attention-based bidirectional long short-term memory (BiLSTM) network to solve the MUD problem. The BiLSTM network creates a pattern of the device activation history using forward and reverse pass LSTMs, whereas the attention mechanism provides essential context to the device activation points. By doing so, a hierarchical pathway is followed for detecting active devices in a grant-free scenario. Then, by utilising the complex spreading sequences, blind data detection for the estimated active devices is performed. The proposed framework does not require prior knowledge of device sparsity levels and channels for performing MUD. The results show that the proposed network achieves better performance compared to existing benchmark schemes.
    Energy-efficient Deployment of Deep Learning Applications on Cortex-M based Microcontrollers using Deep Compression. (arXiv:2205.10369v2 [cs.LG] UPDATED)
    Large Deep Neural Networks (DNNs) are the backbone of today's artificial intelligence due to their ability to make accurate predictions when being trained on huge datasets. With advancing technologies, such as the Internet of Things, interpreting large quantities of data generated by sensors is becoming an increasingly important task. However, in many applications not only the predictive performance but also the energy consumption of deep learning models is of major interest. This paper investigates the efficient deployment of deep learning models on resource-constrained microcontroller architectures via network compression. We present a methodology for the systematic exploration of different DNN pruning, quantization, and deployment strategies, targeting different ARM Cortex-M based low-power systems. The exploration allows to analyze trade-offs between key metrics such as accuracy, memory consumption, execution time, and power consumption. We discuss experimental results on three different DNN architectures and show that we can compress them to below 10\% of their original parameter count before their predictive quality decreases. This also allows us to deploy and evaluate them on Cortex-M based microcontrollers.
    Generalized Laplacian Regularized Framelet Graph Neural Networks. (arXiv:2210.15092v2 [cs.LG] UPDATED)
    This paper introduces a novel Framelet Graph approach based on p-Laplacian GNN. The proposed two models, named p-Laplacian undecimated framelet graph convolution (pL-UFG) and generalized p-Laplacian undecimated framelet graph convolution (pL-fUFG) inherit the nature of p-Laplacian with the expressive power of multi-resolution decomposition of graph signals. The empirical study highlights the excellent performance of the pL-UFG and pL-fUFG in different graph learning tasks including node classification and signal denoising.
    Declarative Mechanism Design. (arXiv:1912.13122v4 [cs.AI] UPDATED)
    Regulation of Multi-Agent Systems (MAS) and Declarative Electronic Institutions (DEIs) was a multidisciplinary research topic of the past decade involving (Physical and Software) Agents and Law since the beginning, but recently evolved towards News-claimed Robot Lawyer since 2016. One of these first proposals of restricting the behaviour of Software Agentswas Electronic Institutions.However, with the recent reformulation of Artificial Neural Networks (ANNs) as Deep Learning (DL), Security, Privacy,Ethical and Legal issues regarding the use of DL has raised concerns in the Artificial Intelligence (AI) Community. Now that the Regulation of MAS is almost correctly addressed, we propose the Regulation of Artificial Neural Networks as Agent-based Training of a special type of regulated Artificial Neural Network that we call Institutional Neural Network (INN).The main purpose of this paper is to bring attention to Artificial Teaching (AT) and to give a tentative answer showing a proof-of-concept implementation of Regulated Deep Learning (RDL). This paper introduces the former concept and provide sI, a language previously used to model declaratively and extend Electronic Institutions, as a means to regulate the execution of Artificial Neural Networks and their interactions with Artificial Teachers (ATs)
    Efficient Bayesian Policy Reuse with a Scalable Observation Model in Deep Reinforcement Learning. (arXiv:2204.07729v3 [cs.LG] UPDATED)
    Bayesian policy reuse (BPR) is a general policy transfer framework for selecting a source policy from an offline library by inferring the task belief based on some observation signals and a trained observation model. In this paper, we propose an improved BPR method to achieve more efficient policy transfer in deep reinforcement learning (DRL). First, most BPR algorithms use the episodic return as the observation signal that contains limited information and cannot be obtained until the end of an episode. Instead, we employ the state transition sample, which is informative and instantaneous, as the observation signal for faster and more accurate task inference. Second, BPR algorithms usually require numerous samples to estimate the probability distribution of the tabular-based observation model, which may be expensive and even infeasible to learn and maintain, especially when using the state transition sample as the signal. Hence, we propose a scalable observation model based on fitting state transition functions of source tasks from only a small number of samples, which can generalize to any signals observed in the target task. Moreover, we extend the offline-mode BPR to the continual learning setting by expanding the scalable observation model in a plug-and-play fashion, which can avoid negative transfer when faced with new unknown tasks. Experimental results show that our method can consistently facilitate faster and more efficient policy transfer.
    Learning low-rank latent mesoscale structures in networks. (arXiv:2102.06984v5 [cs.SI] UPDATED)
    It is common to use networks to encode the architecture of interactions between entities in complex systems in the physical, biological, social, and information sciences. To study the large-scale behavior of complex systems, it is useful to examine mesoscale structures in networks as building blocks that influence such behavior. We present a new approach for describing low-rank mesoscale structures in networks, and we illustrate our approach using several synthetic network models and empirical friendship, collaboration, and protein--protein interaction (PPI) networks. We find that these networks possess a relatively small number of `latent motifs' that together can successfully approximate most subgraphs of a network at a fixed mesoscale. We use an algorithm for `network dictionary learning' (NDL), which combines a network-sampling method and nonnegative matrix factorization, to learn the latent motifs of a given network. The ability to encode a network using a set of latent motifs has a wide variety of applications to network-analysis tasks, such as comparison, denoising, and edge inference. Additionally, using a new network denoising and reconstruction (NDR) algorithm, we demonstrate how to denoise a corrupted network by using only the latent motifs that one learns directly from the corrupted network.
    Accelerated stochastic approximation with state-dependent noise. (arXiv:2307.01497v2 [math.OC] UPDATED)
    We consider a class of stochastic smooth convex optimization problems under rather general assumptions on the noise in the stochastic gradient observation. As opposed to the classical problem setting in which the variance of noise is assumed to be uniformly bounded, herein we assume that the variance of stochastic gradients is related to the "sub-optimality" of the approximate solutions delivered by the algorithm. Such problems naturally arise in a variety of applications, in particular, in the well-known generalized linear regression problem in statistics. However, to the best of our knowledge, none of the existing stochastic approximation algorithms for solving this class of problems attain optimality in terms of the dependence on accuracy, problem parameters, and mini-batch size. We discuss two non-Euclidean accelerated stochastic approximation routines--stochastic accelerated gradient descent (SAGD) and stochastic gradient extrapolation (SGE)--which carry a particular duality relationship. We show that both SAGD and SGE, under appropriate conditions, achieve the optimal convergence rate, attaining the optimal iteration and sample complexities simultaneously. However, corresponding assumptions for the SGE algorithm are more general; they allow, for instance, for efficient application of the SGE to statistical estimation problems under heavy tail noises and discontinuous score functions. We also discuss the application of the SGE to problems satisfying quadratic growth conditions, and show how it can be used to recover sparse solutions. Finally, we report on some simulation experiments to illustrate numerical performance of our proposed algorithms in high-dimensional settings.
    Multiple Testing Framework for Out-of-Distribution Detection. (arXiv:2206.09522v4 [stat.ML] UPDATED)
    We study the problem of Out-of-Distribution (OOD) detection, that is, detecting whether a learning algorithm's output can be trusted at inference time. While a number of tests for OOD detection have been proposed in prior work, a formal framework for studying this problem is lacking. We propose a definition for the notion of OOD that includes both the input distribution and the learning algorithm, which provides insights for the construction of powerful tests for OOD detection. We propose a multiple hypothesis testing inspired procedure to systematically combine any number of different statistics from the learning algorithm using conformal p-values. We further provide strong guarantees on the probability of incorrectly classifying an in-distribution sample as OOD. In our experiments, we find that threshold-based tests proposed in prior work perform well in specific settings, but not uniformly well across different types of OOD instances. In contrast, our proposed method that combines multiple statistics performs uniformly well across different datasets and neural networks.
    The Effectiveness of World Models for Continual Reinforcement Learning. (arXiv:2211.15944v2 [cs.LG] UPDATED)
    World models power some of the most efficient reinforcement learning algorithms. In this work, we showcase that they can be harnessed for continual learning - a situation when the agent faces changing environments. World models typically employ a replay buffer for training, which can be naturally extended to continual learning. We systematically study how different selective experience replay methods affect performance, forgetting, and transfer. We also provide recommendations regarding various modeling options for using world models. The best set of choices is called Continual-Dreamer, it is task-agnostic and utilizes the world model for continual exploration. Continual-Dreamer is sample efficient and outperforms state-of-the-art task-agnostic continual reinforcement learning methods on Minigrid and Minihack benchmarks.
    Classification and Generation of real-world data with an Associative Memory Model. (arXiv:2207.04827v4 [cs.NE] UPDATED)
    Drawing from memory the face of a friend you have not seen in years is a difficult task. However, if you happen to cross paths, you would easily recognize each other. The biological memory is equipped with an impressive compression algorithm that can store the essential, and then infer the details to match perception. The Willshaw Memory is a simple abstract model for cortical computations which implements mechanisms of biological memories. Using our recently proposed sparse coding prescription for visual patterns, this model can store and retrieve an impressive amount of real-world data in a fault-tolerant manner. In this paper, we extend the capabilities of the basic Associative Memory Model by using a Multiple-Modality framework. In this setting, the memory stores several modalities (e.g., visual, or textual) of each pattern simultaneously. After training, the memory can be used to infer missing modalities when just a subset is perceived. Using a simple encoder-memory-decoder architecture, and a newly proposed iterative retrieval algorithm for the Willshaw Model, we perform experiments on the MNIST dataset. By storing both the images and labels as modalities, a single Memory can be used not only to retrieve and complete patterns but also to classify and generate new ones. We further discuss how this model could be used for other learning tasks, thus serving as a biologically-inspired framework for learning.
    Adapting to Mixing Time in Stochastic Optimization with Markovian Data. (arXiv:2202.04428v3 [cs.LG] UPDATED)
    We consider stochastic optimization problems where data is drawn from a Markov chain. Existing methods for this setting crucially rely on knowing the mixing time of the chain, which in real-world applications is usually unknown. We propose the first optimization method that does not require the knowledge of the mixing time, yet obtains the optimal asymptotic convergence rate when applied to convex problems. We further show that our approach can be extended to: (i) finding stationary points in non-convex optimization with Markovian data, and (ii) obtaining better dependence on the mixing time in temporal difference (TD) learning; in both cases, our method is completely oblivious to the mixing time. Our method relies on a novel combination of multi-level Monte Carlo (MLMC) gradient estimation together with an adaptive learning method.
    The complexity of non-stationary reinforcement learning. (arXiv:2307.06877v1 [cs.LG])
    The problem of continual learning in the domain of reinforcement learning, often called non-stationary reinforcement learning, has been identified as an important challenge to the application of reinforcement learning. We prove a worst-case complexity result, which we believe captures this challenge: Modifying the probabilities or the reward of a single state-action pair in a reinforcement learning problem requires an amount of time almost as large as the number of states in order to keep the value function up to date, unless the strong exponential time hypothesis (SETH) is false; SETH is a widely accepted strengthening of the P $\neq$ NP conjecture. Recall that the number of states in current applications of reinforcement learning is typically astronomical. In contrast, we show that just $\textit{adding}$ a new state-action pair is considerably easier to implement.
    Uncovering Unique Concept Vectors through Latent Space Decomposition. (arXiv:2307.06913v1 [cs.LG])
    Interpreting the inner workings of deep learning models is crucial for establishing trust and ensuring model safety. Concept-based explanations have emerged as a superior approach that is more interpretable than feature attribution estimates such as pixel saliency. However, defining the concepts for the interpretability analysis biases the explanations by the user's expectations on the concepts. To address this, we propose a novel post-hoc unsupervised method that automatically uncovers the concepts learned by deep models during training. By decomposing the latent space of a layer in singular vectors and refining them by unsupervised clustering, we uncover concept vectors aligned with directions of high variance that are relevant to the model prediction, and that point to semantically distinct concepts. Our extensive experiments reveal that the majority of our concepts are readily understandable to humans, exhibit coherency, and bear relevance to the task at hand. Moreover, we showcase the practical utility of our method in dataset exploration, where our concept vectors successfully identify outlier training samples affected by various confounding factors. This novel exploration technique has remarkable versatility to data types and model architectures and it will facilitate the identification of biases and the discovery of sources of error within training data.
    Identifying Early Help Referrals For Local Authorities With Machine Learning And Bias Analysis. (arXiv:2307.06871v1 [cs.LG])
    Local authorities in England, such as Leicestershire County Council (LCC), provide Early Help services that can be offered at any point in a young person's life when they experience difficulties that cannot be supported by universal services alone, such as schools. This paper investigates the utilisation of machine learning (ML) to assist experts in identifying families that may need to be referred for Early Help assessment and support. LCC provided an anonymised dataset comprising 14360 records of young people under the age of 18. The dataset was pre-processed, machine learning models were build, and experiments were conducted to validate and test the performance of the models. Bias mitigation techniques were applied to improve the fairness of these models. During testing, while the models demonstrated the capability to identify young people requiring intervention or early help, they also produced a significant number of false positives, especially when constructed with imbalanced data, incorrectly identifying individuals who most likely did not need an Early Help referral. This paper empirically explores the suitability of data-driven ML models for identifying young people who may require Early Help services and discusses their appropriateness and limitations for this task.
    Weighted Averaged Stochastic Gradient Descent: Asymptotic Normality and Optimality. (arXiv:2307.06915v1 [stat.ML])
    Stochastic Gradient Descent (SGD) is one of the simplest and most popular algorithms in modern statistical and machine learning due to its computational and memory efficiency. Various averaging schemes have been proposed to accelerate the convergence of SGD in different settings. In this paper, we explore a general averaging scheme for SGD. Specifically, we establish the asymptotic normality of a broad range of weighted averaged SGD solutions and provide asymptotically valid online inference approaches. Furthermore, we propose an adaptive averaging scheme that exhibits both optimal statistical rate and favorable non-asymptotic convergence, drawing insights from the optimal weight for the linear model in terms of non-asymptotic mean squared error (MSE).
    Ensemble learning for blending gridded satellite and gauge-measured precipitation data. (arXiv:2307.06840v1 [cs.LG])
    Regression algorithms are regularly used for improving the accuracy of satellite precipitation products. In this context, ground-based measurements are the dependent variable and the satellite data are the predictor variables, together with topography factors. Alongside this, it is increasingly recognised in many fields that combinations of algorithms through ensemble learning can lead to substantial predictive performance improvements. Still, a sufficient number of ensemble learners for improving the accuracy of satellite precipitation products and their large-scale comparison are currently missing from the literature. In this work, we fill this specific gap by proposing 11 new ensemble learners in the field and by extensively comparing them for the entire contiguous United States and for a 15-year period. We use monthly data from the PERSIANN (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks) and IMERG (Integrated Multi-satellitE Retrievals for GPM) gridded datasets. We also use gauge-measured precipitation data from the Global Historical Climatology Network monthly database, version 2 (GHCNm). The ensemble learners combine the predictions by six regression algorithms (base learners), namely the multivariate adaptive regression splines (MARS), multivariate adaptive polynomial splines (poly-MARS), random forests (RF), gradient boosting machines (GBM), extreme gradient boosting (XGBoost) and Bayesian regularized neural networks (BRNN), and each of them is based on a different combiner. The combiners include the equal-weight combiner, the median combiner, two best learners and seven variants of a sophisticated stacking method. The latter stacks a regression algorithm on the top of the base learners to combine their independent predictions...
    AnuraSet: A dataset for benchmarking Neotropical anuran calls identification in passive acoustic monitoring. (arXiv:2307.06860v1 [cs.SD])
    Global change is predicted to induce shifts in anuran acoustic behavior, which can be studied through passive acoustic monitoring (PAM). Understanding changes in calling behavior requires the identification of anuran species, which is challenging due to the particular characteristics of neotropical soundscapes. In this paper, we introduce a large-scale multi-species dataset of anuran amphibians calls recorded by PAM, that comprises 27 hours of expert annotations for 42 different species from two Brazilian biomes. We provide open access to the dataset, including the raw recordings, experimental setup code, and a benchmark with a baseline model of the fine-grained categorization problem. Additionally, we highlight the challenges of the dataset to encourage machine learning researchers to solve the problem of anuran call identification towards conservation policy. All our experiments and resources can be found on our GitHub repository https://github.com/soundclim/anuraset.
    Towards Learning to Imitate from a Single Video Demonstration. (arXiv:1901.07186v4 [cs.LG] UPDATED)
    Agents that can learn to imitate given video observation -- \emph{without direct access to state or action information} are more applicable to learning in the natural world. However, formulating a reinforcement learning (RL) agent that facilitates this goal remains a significant challenge. We approach this challenge using contrastive training to learn a reward function comparing an agent's behaviour with a single demonstration. We use a Siamese recurrent neural network architecture to learn rewards in space and time between motion clips while training an RL policy to minimize this distance. Through experimentation, we also find that the inclusion of multi-task data and additional image encoding losses improve the temporal consistency of the learned rewards and, as a result, significantly improves policy learning. We demonstrate our approach on simulated humanoid, dog, and raptor agents in 2D and a quadruped and a humanoid in 3D. We show that our method outperforms current state-of-the-art techniques in these environments and can learn to imitate from a single video demonstration.
    Emergent Neural Network Mechanisms for Generalization to Objects in Novel Orientations. (arXiv:2109.13445v2 [cs.CV] UPDATED)
    The capability of Deep Neural Networks (DNNs) to recognize objects in orientations outside the distribution of the training data is not well understood. We present evidence that DNNs are capable of generalizing to objects in novel orientations by disseminating orientation-invariance obtained from familiar objects seen from many viewpoints. This capability strengthens when training the DNN with an increasing number of familiar objects, but only in orientations that involve 2D rotations of familiar orientations. We show that this dissemination is achieved via neurons tuned to common features between familiar and unfamiliar objects. These results implicate brain-like neural mechanisms for generalization.
    Fast and Functional Structured Data Generators Rooted in Out-of-Equilibrium Physics. (arXiv:2307.06797v1 [cs.LG])
    In this study, we address the challenge of using energy-based models to produce high-quality, label-specific data in complex structured datasets, such as population genetics, RNA or protein sequences data. Traditional training methods encounter difficulties due to inefficient Markov chain Monte Carlo mixing, which affects the diversity of synthetic data and increases generation times. To address these issues, we use a novel training algorithm that exploits non-equilibrium effects. This approach, applied on the Restricted Boltzmann Machine, improves the model's ability to correctly classify samples and generate high-quality synthetic data in only a few sampling steps. The effectiveness of this method is demonstrated by its successful application to four different types of data: handwritten digits, mutations of human genomes classified by continental origin, functionally characterized sequences of an enzyme protein family, and homologous RNA sequences from specific taxonomies.
    TinyMetaFed: Efficient Federated Meta-Learning for TinyML. (arXiv:2307.06822v1 [cs.LG])
    The field of Tiny Machine Learning (TinyML) has made substantial advancements in democratizing machine learning on low-footprint devices, such as microcontrollers. The prevalence of these miniature devices raises the question of whether aggregating their knowledge can benefit TinyML applications. Federated meta-learning is a promising answer to this question, as it addresses the scarcity of labeled data and heterogeneous data distribution across devices in the real world. However, deploying TinyML hardware faces unique resource constraints, making existing methods impractical due to energy, privacy, and communication limitations. We introduce TinyMetaFed, a model-agnostic meta-learning framework suitable for TinyML. TinyMetaFed facilitates collaborative training of a neural network initialization that can be quickly fine-tuned on new devices. It offers communication savings and privacy protection through partial local reconstruction and Top-P% selective communication, computational efficiency via online learning, and robustness to client heterogeneity through few-shot learning. The evaluations on three TinyML use cases demonstrate that TinyMetaFed can significantly reduce energy consumption and communication overhead, accelerate convergence, and stabilize the training process.
    Enhancing Reliability in Federated mmWave Networks: A Practical and Scalable Solution using Radar-Aided Dynamic Blockage Recognition. (arXiv:2307.06834v1 [cs.NI])
    This article introduces a new method to improve the dependability of millimeter-wave (mmWave) and terahertz (THz) network services in dynamic outdoor environments. In these settings, line-of-sight (LoS) connections are easily interrupted by moving obstacles like humans and vehicles. The proposed approach, coined as Radar-aided Dynamic blockage Recognition (RaDaR), leverages radar measurements and federated learning (FL) to train a dual-output neural network (NN) model capable of simultaneously predicting blockage status and time. This enables determining the optimal point for proactive handover (PHO) or beam switching, thereby reducing the latency introduced by 5G new radio procedures and ensuring high quality of experience (QoE). The framework employs radar sensors to monitor and track objects movement, generating range-angle and range-velocity maps that are useful for scene analysis and predictions. Moreover, FL provides additional benefits such as privacy protection, scalability, and knowledge sharing. The framework is assessed using an extensive real-world dataset comprising mmWave channel information and radar data. The evaluation results show that RaDaR substantially enhances network reliability, achieving an average success rate of 94% for PHO compared to existing reactive HO procedures that lack proactive blockage prediction. Additionally, RaDaR maintains a superior QoE by ensuring sustained high throughput levels and minimising PHO latency.
    Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks. (arXiv:2307.06887v1 [cs.LG])
    Feature learning, i.e. extracting meaningful representations of data, is quintessential to the practical success of neural networks trained with gradient descent, yet it is notoriously difficult to explain how and why it occurs. Recent theoretical studies have shown that shallow neural networks optimized on a single task with gradient-based methods can learn meaningful features, extending our understanding beyond the neural tangent kernel or random feature regime in which negligible feature learning occurs. But in practice, neural networks are increasingly often trained on {\em many} tasks simultaneously with differing loss functions, and these prior analyses do not generalize to such settings. In the multi-task learning setting, a variety of studies have shown effective feature learning by simple linear models. However, multi-task learning via {\em nonlinear} models, arguably the most common learning paradigm in practice, remains largely mysterious. In this work, we present the first results proving feature learning occurs in a multi-task setting with a nonlinear model. We show that when the tasks are binary classification problems with labels depending on only $r$ directions within the ambient $d\gg r$-dimensional input space, executing a simple gradient-based multitask learning algorithm on a two-layer ReLU neural network learns the ground-truth $r$ directions. In particular, any downstream task on the $r$ ground-truth coordinates can be solved by learning a linear classifier with sample and neuron complexity independent of the ambient dimension $d$, while a random feature model requires exponential complexity in $d$ for such a guarantee.
    Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models. (arXiv:2307.06925v1 [cs.CV])
    Text-to-image (T2I) personalization allows users to guide the creative image generation process by combining their own visual concepts in natural language prompts. Recently, encoder-based techniques have emerged as a new effective approach for T2I personalization, reducing the need for multiple images and long training times. However, most existing encoders are limited to a single-class domain, which hinders their ability to handle diverse concepts. In this work, we propose a domain-agnostic method that does not require any specialized dataset or prior information about the personalized concepts. We introduce a novel contrastive-based regularization technique to maintain high fidelity to the target concept characteristics while keeping the predicted embeddings close to editable regions of the latent space, by pushing the predicted tokens toward their nearest existing CLIP tokens. Our experimental results demonstrate the effectiveness of our approach and show how the learned tokens are more semantic than tokens predicted by unregularized models. This leads to a better representation that achieves state-of-the-art performance while being more flexible than previous methods.
    FDAPT: Federated Domain-adaptive Pre-training for Language Models. (arXiv:2307.06933v1 [cs.LG])
    Combining Domain-adaptive Pre-training (DAPT) with Federated Learning (FL) can enhance model adaptation by leveraging more sensitive and distributed data while preserving data privacy. However, few studies have focused on this method. Therefore, we conduct the first comprehensive empirical study to evaluate the performance of Federated Domain-adaptive Pre-training (FDAPT). We demonstrate that FDAPT can maintain competitive downstream task performance to the centralized baseline in both IID and non-IID situations. Furthermore, we propose a novel algorithm, Frozen Federated Domain-adaptive Pre-training (FFDAPT). FFDAPT improves the computational efficiency by 12.1% on average and exhibits similar downstream task performance to standard FDAPT, with general performance fluctuations remaining less than 1%. Finally, through a critical evaluation of our work, we identify promising future research directions for this new research area.
    CLAIMED -- the open source framework for building coarse-grained operators for accelerated discovery in science. (arXiv:2307.06824v1 [cs.AI])
    In modern data-driven science, reproducibility and reusability are key challenges. Scientists are well skilled in the process from data to publication. Although some publication channels require source code and data to be made accessible, rerunning and verifying experiments is usually hard due to a lack of standards. Therefore, reusing existing scientific data processing code from state-of-the-art research is hard as well. This is why we introduce CLAIMED, which has a proven track record in scientific research for addressing the repeatability and reusability issues in modern data-driven science. CLAIMED is a framework to build reusable operators and scalable scientific workflows by supporting the scientist to draw from previous work by re-composing workflows from existing libraries of coarse-grained scientific operators. Although various implementations exist, CLAIMED is programming language, scientific library, and execution environment agnostic.
    Min-Max Optimization under Delays. (arXiv:2307.06886v1 [cs.LG])
    Delays and asynchrony are inevitable in large-scale machine-learning problems where communication plays a key role. As such, several works have extensively analyzed stochastic optimization with delayed gradients. However, as far as we are aware, no analogous theory is available for min-max optimization, a topic that has gained recent popularity due to applications in adversarial robustness, game theory, and reinforcement learning. Motivated by this gap, we examine the performance of standard min-max optimization algorithms with delayed gradient updates. First, we show (empirically) that even small delays can cause prominent algorithms like Extra-gradient (\texttt{EG}) to diverge on simple instances for which \texttt{EG} guarantees convergence in the absence of delays. Our empirical study thus suggests the need for a careful analysis of delayed versions of min-max optimization algorithms. Accordingly, under suitable technical assumptions, we prove that Gradient Descent-Ascent (\texttt{GDA}) and \texttt{EG} with delayed updates continue to guarantee convergence to saddle points for convex-concave and strongly convex-strongly concave settings. Our complexity bounds reveal, in a transparent manner, the slow-down in convergence caused by delays.
    Privacy-Utility Trade-offs in Neural Networks for Medical Population Graphs: Insights from Differential Privacy and Graph Structure. (arXiv:2307.06760v1 [cs.LG])
    We initiate an empirical investigation into differentially private graph neural networks on population graphs from the medical domain by examining privacy-utility trade-offs at different privacy levels on both real-world and synthetic datasets and performing auditing through membership inference attacks. Our findings highlight the potential and the challenges of this specific DP application area. Moreover, we find evidence that the underlying graph structure constitutes a potential factor for larger performance gaps by showing a correlation between the degree of graph homophily and the accuracy of the trained model.
    Data-driven Nonlinear Parametric Model Order Reduction Framework using Deep Hierarchical Variational Autoencoder. (arXiv:2307.06816v1 [cs.LG])
    A data-driven parametric model order reduction (MOR) method using a deep artificial neural network is proposed. The present network, which is the least-squares hierarchical variational autoencoder (LSH-VAE), is capable of performing nonlinear MOR for the parametric interpolation of a nonlinear dynamic system with a significant number of degrees of freedom. LSH-VAE exploits two major changes to the existing networks: a hierarchical deep structure and a hybrid weighted, probabilistic loss function. The enhancements result in a significantly improved accuracy and stability compared against the conventional nonlinear MOR methods, autoencoder, and variational autoencoder. Upon LSH-VAE, a parametric MOR framework is presented based on the spherically linear interpolation of the latent manifold. The present framework is validated and evaluated on three nonlinear and multiphysics dynamic systems. First, the present framework is evaluated on the fluid-structure interaction benchmark problem to assess its efficiency and accuracy. Then, a highly nonlinear aeroelastic phenomenon, limit cycle oscillation, is analyzed. Finally, the present framework is applied to a three-dimensional fluid flow to demonstrate its capability of efficiently analyzing a significantly large number of degrees of freedom. The performance of LSH-VAE is emphasized by comparing its results against that of the widely used nonlinear MOR methods, convolutional autoencoder, and $\beta$-VAE. The present framework exhibits a significantly enhanced accuracy to the conventional methods while still exhibiting a large speed-up factor.
    PC-Droid: Faster diffusion and improved quality for particle cloud generation. (arXiv:2307.06836v1 [hep-ex])
    Building on the success of PC-JeDi we introduce PC-Droid, a substantially improved diffusion model for the generation of jet particle clouds. By leveraging a new diffusion formulation, studying more recent integration solvers, and training on all jet types simultaneously, we are able to achieve state-of-the-art performance for all types of jets across all evaluation metrics. We study the trade-off between generation speed and quality by comparing two attention based architectures, as well as the potential of consistency distillation to reduce the number of diffusion steps. Both the faster architecture and consistency models demonstrate performance surpassing many competing models, with generation time up to two orders of magnitude faster than PC-JeDi.
    Generalizing Supervised Deep Learning MRI Reconstruction to Multiple and Unseen Contrasts using Meta-Learning Hypernetworks. (arXiv:2307.06771v1 [eess.IV])
    Meta-learning has recently been an emerging data-efficient learning technique for various medical imaging operations and has helped advance contemporary deep learning models. Furthermore, meta-learning enhances the knowledge generalization of the imaging tasks by learning both shared and discriminative weights for various configurations of imaging tasks. However, existing meta-learning models attempt to learn a single set of weight initializations of a neural network that might be restrictive for multimodal data. This work aims to develop a multimodal meta-learning model for image reconstruction, which augments meta-learning with evolutionary capabilities to encompass diverse acquisition settings of multimodal data. Our proposed model called KM-MAML (Kernel Modulation-based Multimodal Meta-Learning), has hypernetworks that evolve to generate mode-specific weights. These weights provide the mode-specific inductive bias for multiple modes by re-calibrating each kernel of the base network for image reconstruction via a low-rank kernel modulation operation. We incorporate gradient-based meta-learning (GBML) in the contextual space to update the weights of the hypernetworks for different modes. The hypernetworks and the reconstruction network in the GBML setting provide discriminative mode-specific features and low-level image features, respectively. Experiments on multi-contrast MRI reconstruction show that our model, (i) exhibits superior reconstruction performance over joint training, other meta-learning methods, and context-specific MRI reconstruction methods, and (ii) better adaptation capabilities with improvement margins of 0.5 dB in PSNR and 0.01 in SSIM. Besides, a representation analysis with U-Net shows that kernel modulation infuses 80% of mode-specific representation changes in the high-resolution layers. Our source code is available at https://github.com/sriprabhar/KM-MAML/.
    Robotic surface exploration with vision and tactile sensing for cracks detection and characterisation. (arXiv:2307.06784v1 [cs.RO])
    This paper presents a novel algorithm for crack localisation and detection based on visual and tactile analysis via fibre-optics. A finger-shaped sensor based on fibre-optics is employed for the data acquisition to collect data for the analysis and the experiments. To detect the possible locations of cracks a camera is used to scan an environment while running an object detection algorithm. Once the crack is detected, a fully-connected graph is created from a skeletonised version of the crack. A minimum spanning tree is then employed for calculating the shortest path to explore the crack which is then used to develop the motion planner for the robotic manipulator. The motion planner divides the crack into multiple nodes which are then explored individually. Then, the manipulator starts the exploration and performs the tactile data classification to confirm if there is indeed a crack in that location or just a false positive from the vision algorithm. If a crack is detected, also the length, width, orientation and number of branches are calculated. This is repeated until all the nodes of the crack are explored. In order to validate the complete algorithm, various experiments are performed: comparison of exploration of cracks through full scan and motion planning algorithm, implementation of frequency-based features for crack classification and geometry analysis using a combination of vision and tactile data. From the results of the experiments, it is shown that the proposed algorithm is able to detect cracks and improve the results obtained from vision to correctly classify cracks and their geometry with minimal cost thanks to the motion planning algorithm.
    Federated Multi-Agent Deep Reinforcement Learning for Dynamic and Flexible 3D Operation of 5G Multi-MAP Networks. (arXiv:2307.06842v1 [cs.NI])
    This paper addresses the efficient management of Mobile Access Points (MAPs), which are Unmanned Aerial Vehicles (UAV), in 5G networks. We propose a two-level hierarchical architecture, which dynamically reconfigures the network while considering Integrated Access-Backhaul (IAB) constraints. The high-layer decision process determines the number of MAPs through consensus, and we develop a joint optimization process to account for co-dependence in network self-management. In the low-layer, MAPs manage their placement using a double-attention based Deep Reinforcement Learning (DRL) model that encourages cooperation without retraining. To improve generalization and reduce complexity, we propose a federated mechanism for training and sharing one placement model for every MAP in the low-layer. Additionally, we jointly optimize the placement and backhaul connectivity of MAPs using a multi-objective reward function, considering the impact of varying MAP placement on wireless backhaul connectivity.
    Self-consistency for open-ended generations. (arXiv:2307.06857v1 [cs.AI])
    In this paper, we present a novel approach for improving the quality and consistency of generated outputs from large-scale pre-trained language models (LLMs). Self-consistency has emerged as an effective approach for prompts with fixed answers, selecting the answer with the highest number of votes. In this paper, we introduce a generalized framework for self-consistency that extends its applicability beyond problems that have fixed-answer answers. Through extensive simulations, we demonstrate that our approach consistently recovers the optimal or near-optimal generation from a set of candidates. We also propose lightweight parameter-free similarity functions that show significant and consistent improvements across code generation, autoformalization, and summarization tasks, even without access to token log probabilities. Our method incurs minimal computational overhead, requiring no auxiliary reranker models or modifications to the existing model.
    Improved selective background Monte Carlo simulation at Belle II with graph attention networks and weighted events. (arXiv:2307.06434v1 [hep-ex])
    When measuring rare processes at Belle II, a huge luminosity is required, which means a large number of simulations are necessary to determine signal efficiencies and background contributions. However, this process demands high computation costs while most of the simulated data, in particular in case of background, are discarded by the event selection. Thus, filters using graph neural networks are introduced at an early stage to save the resources for the detector simulation and reconstruction of events discarded at analysis level. In our work, we improved the performance of the filters using graph attention and investigated statistical methods including sampling and reweighting to deal with the biases introduced by the filtering.  ( 2 min )
    Why Guided Dialog Policy Learning performs well? Understanding the role of adversarial learning and its alternative. (arXiv:2307.06721v1 [cs.CL])
    Dialog policies, which determine a system's action based on the current state at each dialog turn, are crucial to the success of the dialog. In recent years, reinforcement learning (RL) has emerged as a promising option for dialog policy learning (DPL). In RL-based DPL, dialog policies are updated according to rewards. The manual construction of fine-grained rewards, such as state-action-based ones, to effectively guide the dialog policy is challenging in multi-domain task-oriented dialog scenarios with numerous state-action pair combinations. One way to estimate rewards from collected data is to train the reward estimator and dialog policy simultaneously using adversarial learning (AL). Although this method has demonstrated superior performance experimentally, it is fraught with the inherent problems of AL, such as mode collapse. This paper first identifies the role of AL in DPL through detailed analyses of the objective functions of dialog policy and reward estimator. Next, based on these analyses, we propose a method that eliminates AL from reward estimation and DPL while retaining its advantages. We evaluate our method using MultiWOZ, a multi-domain task-oriented dialog corpus.
    Breaking 3-Factor Approximation for Correlation Clustering in Polylogarithmic Rounds. (arXiv:2307.06723v1 [cs.DS])
    In this paper, we study parallel algorithms for the correlation clustering problem, where every pair of two different entities is labeled with similar or dissimilar. The goal is to partition the entities into clusters to minimize the number of disagreements with the labels. Currently, all efficient parallel algorithms have an approximation ratio of at least 3. In comparison with the $1.994+\epsilon$ ratio achieved by polynomial-time sequential algorithms [CLN22], a significant gap exists. We propose the first poly-logarithmic depth parallel algorithm that achieves a better approximation ratio than 3. Specifically, our algorithm computes a $(2.4+\epsilon)$-approximate solution and uses $\tilde{O}(m^{1.5})$ work. Additionally, it can be translated into a $\tilde{O}(m^{1.5})$-time sequential algorithm and a poly-logarithmic rounds sublinear-memory MPC algorithm with $\tilde{O}(m^{1.5})$ total memory. Our approach is inspired by Awerbuch, Khandekar, and Rao's [AKR12] length-constrained multi-commodity flow algorithm, where we develop an efficient parallel algorithm to solve a truncated correlation clustering linear program of Charikar, Guruswami, and Wirth [CGW05]. Then we show the solution of the truncated linear program can be rounded with a factor of at most 2.4 loss by using the framework of [CMSY15]. Such a rounding framework can then be implemented using parallel pivot-based approaches.
    Embodied Lifelong Learning for Task and Motion Planning. (arXiv:2307.06870v1 [cs.RO])
    A robot deployed in a home over long stretches of time faces a true lifelong learning problem. As it seeks to provide assistance to its users, the robot should leverage any accumulated experience to improve its own knowledge to become a more proficient assistant. We formalize this setting with a novel lifelong learning problem formulation in the context of learning for task and motion planning (TAMP). Exploiting the modularity of TAMP systems, we develop a generative mixture model that produces candidate continuous parameters for a planner. Whereas most existing lifelong learning approaches determine a priori how data is shared across task models, our approach learns shared and non-shared models and determines which to use online during planning based on auxiliary tasks that serve as a proxy for each model's understanding of a state. Our method exhibits substantial improvements in planning success on simulated 2D domains and on several problems from the BEHAVIOR benchmark.
    A survey on deep learning approaches for data integration in autonomous driving system. (arXiv:2306.11740v2 [cs.RO] UPDATED)
    The perception module of self-driving vehicles relies on a multi-sensor system to understand its environment. Recent advancements in deep learning have led to the rapid development of approaches that integrate multi-sensory measurements to enhance perception capabilities. This paper surveys the latest deep learning integration techniques applied to the perception module in autonomous driving systems, categorizing integration approaches based on "what, how, and when to integrate". A new taxonomy of integration is proposed, based on three dimensions: multi-view, multi-modality, and multi-frame. The integration operations and their pros and cons are summarized, providing new insights into the properties of an "ideal" data integration approach that can alleviate the limitations of existing methods. After reviewing hundreds of relevant papers, this survey concludes with a discussion of the key features of an optimal data integration approach.
    Vehicle Dispatching and Routing of On-Demand Intercity Ride-Pooling Services: A Multi-Agent Hierarchical Reinforcement Learning Approach. (arXiv:2307.06742v1 [eess.SY])
    The integrated development of city clusters has given rise to an increasing demand for intercity travel. Intercity ride-pooling service exhibits considerable potential in upgrading traditional intercity bus services by implementing demand-responsive enhancements. Nevertheless, its online operations suffer the inherent complexities due to the coupling of vehicle resource allocation among cities and pooled-ride vehicle routing. To tackle these challenges, this study proposes a two-level framework designed to facilitate online fleet management. Specifically, a novel multi-agent feudal reinforcement learning model is proposed at the upper level of the framework to cooperatively assign idle vehicles to different intercity lines, while the lower level updates the routes of vehicles using an adaptive large neighborhood search heuristic. Numerical studies based on the realistic dataset of Xiamen and its surrounding cities in China show that the proposed framework effectively mitigates the supply and demand imbalances, and achieves significant improvement in both the average daily system profit and order fulfillment ratio.
    HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models. (arXiv:2307.06949v1 [cs.CV])
    Personalization has emerged as a prominent aspect within the field of generative AI, enabling the synthesis of individuals in diverse contexts and styles, while retaining high-fidelity to their identities. However, the process of personalization presents inherent challenges in terms of time and memory requirements. Fine-tuning each personalized model needs considerable GPU time investment, and storing a personalized model per subject can be demanding in terms of storage capacity. To overcome these challenges, we propose HyperDreamBooth-a hypernetwork capable of efficiently generating a small set of personalized weights from a single image of a person. By composing these weights into the diffusion model, coupled with fast finetuning, HyperDreamBooth can generate a person's face in various contexts and styles, with high subject details while also preserving the model's crucial knowledge of diverse styles and semantic modifications. Our method achieves personalization on faces in roughly 20 seconds, 25x faster than DreamBooth and 125x faster than Textual Inversion, using as few as one reference image, with the same quality and style diversity as DreamBooth. Also our method yields a model that is 10000x smaller than a normal DreamBooth model. Project page: https://hyperdreambooth.github.io
    S-HR-VQVAE: Sequential Hierarchical Residual Learning Vector Quantized Variational Autoencoder for Video Prediction. (arXiv:2307.06701v1 [cs.CV])
    We address the video prediction task by putting forth a novel model that combines (i) our recently proposed hierarchical residual vector quantized variational autoencoder (HR-VQVAE), and (ii) a novel spatiotemporal PixelCNN (ST-PixelCNN). We refer to this approach as a sequential hierarchical residual learning vector quantized variational autoencoder (S-HR-VQVAE). By leveraging the intrinsic capabilities of HR-VQVAE at modeling still images with a parsimonious representation, combined with the ST-PixelCNN's ability at handling spatiotemporal information, S-HR-VQVAE can better deal with chief challenges in video prediction. These include learning spatiotemporal information, handling high dimensional data, combating blurry prediction, and implicit modeling of physical characteristics. Extensive experimental results on the KTH Human Action and Moving-MNIST tasks demonstrate that our model compares favorably against top video prediction techniques both in quantitative and qualitative evaluations despite a much smaller model size. Finally, we boost S-HR-VQVAE by proposing a novel training method to jointly estimate the HR-VQVAE and ST-PixelCNN parameters.
    Unsupervised Calibration through Prior Adaptation for Text Classification using Large Language Models. (arXiv:2307.06713v1 [cs.CL])
    A wide variety of natural language tasks are currently being addressed with large-scale language models (LLMs). These models are usually trained with a very large amount of unsupervised text data and adapted to perform a downstream natural language task using methods like fine-tuning, calibration or in-context learning. In this work, we propose an approach to adapt the prior class distribution to perform text classification tasks without the need for labelled samples and only few in-domain sample queries. The proposed approach treats the LLM as a black box, adding a stage where the model posteriors are calibrated to the task. Results show that these methods outperform the un-adapted model for different number of training shots in the prompt and a previous approach were calibration is performed without using any adaptation data.
    Cramer Type Distances for Learning Gaussian Mixture Models by Gradient Descent. (arXiv:2307.06753v1 [cs.LG])
    The learning of Gaussian Mixture Models (also referred to simply as GMMs) plays an important role in machine learning. Known for their expressiveness and interpretability, Gaussian mixture models have a wide range of applications, from statistics, computer vision to distributional reinforcement learning. However, as of today, few known algorithms can fit or learn these models, some of which include Expectation-Maximization algorithms and Sliced Wasserstein Distance. Even fewer algorithms are compatible with gradient descent, the common learning process for neural networks. In this paper, we derive a closed formula of two GMMs in the univariate, one-dimensional case, then propose a distance function called Sliced Cram\'er 2-distance for learning general multivariate GMMs. Our approach has several advantages over many previous methods. First, it has a closed-form expression for the univariate case and is easy to compute and implement using common machine learning libraries (e.g., PyTorch and TensorFlow). Second, it is compatible with gradient descent, which enables us to integrate GMMs with neural networks seamlessly. Third, it can fit a GMM not only to a set of data points, but also to another GMM directly, without sampling from the target model. And fourth, it has some theoretical guarantees like global gradient boundedness and unbiased sampling gradient. These features are especially useful for distributional reinforcement learning and Deep Q Networks, where the goal is to learn a distribution over future rewards. We will also construct a Gaussian Mixture Distributional Deep Q Network as a toy example to demonstrate its effectiveness. Compared with previous models, this model is parameter efficient in terms of representing a distribution and possesses better interpretability.
    A Novel Bayes' Theorem for Upper Probabilities. (arXiv:2307.06831v1 [stat.ML])
    In their seminal 1990 paper, Wasserman and Kadane establish an upper bound for the Bayes' posterior probability of a measurable set $A$, when the prior lies in a class of probability measures $\mathcal{P}$ and the likelihood is precise. They also give a sufficient condition for such upper bound to hold with equality. In this paper, we introduce a generalization of their result by additionally addressing uncertainty related to the likelihood. We give an upper bound for the posterior probability when both the prior and the likelihood belong to a set of probabilities. Furthermore, we give a sufficient condition for this upper bound to become an equality. This result is interesting on its own, and has the potential of being applied to various fields of engineering (e.g. model predictive control), machine learning, and artificial intelligence.
    Frameless Graph Knowledge Distillation. (arXiv:2307.06631v1 [cs.LG])
    Knowledge distillation (KD) has shown great potential for transferring knowledge from a complex teacher model to a simple student model in which the heavy learning task can be accomplished efficiently and without losing too much prediction accuracy. Recently, many attempts have been made by applying the KD mechanism to the graph representation learning models such as graph neural networks (GNNs) to accelerate the model's inference speed via student models. However, many existing KD-based GNNs utilize MLP as a universal approximator in the student model to imitate the teacher model's process without considering the graph knowledge from the teacher model. In this work, we provide a KD-based framework on multi-scaled GNNs, known as graph framelet, and prove that by adequately utilizing the graph knowledge in a multi-scaled manner provided by graph framelet decomposition, the student model is capable of adapting both homophilic and heterophilic graphs and has the potential of alleviating the over-squashing issue with a simple yet effectively graph surgery. Furthermore, we show how the graph knowledge supplied by the teacher is learned and digested by the student model via both algebra and geometry. Comprehensive experiments show that our proposed model can generate learning accuracy identical to or even surpass the teacher model while maintaining the high speed of inference.
    Testing Sparsity Assumptions in Bayesian Networks. (arXiv:2307.06406v1 [stat.ML])
    Bayesian network (BN) structure discovery algorithms typically either make assumptions about the sparsity of the true underlying network, or are limited by computational constraints to networks with a small number of variables. While these sparsity assumptions can take various forms, frequently the assumptions focus on an upper bound for the maximum in-degree of the underlying graph $\nabla_G$. Theorem 2 in Duttweiler et. al. (2023) demonstrates that the largest eigenvalue of the normalized inverse covariance matrix ($\Omega$) of a linear BN is a lower bound for $\nabla_G$. Building on this result, this paper provides the asymptotic properties of, and a debiasing procedure for, the sample eigenvalues of $\Omega$, leading to a hypothesis test that may be used to determine if the BN has max in-degree greater than 1. A linear BN structure discovery workflow is suggested in which the investigator uses this hypothesis test to aid in selecting an appropriate structure discovery algorithm. The hypothesis test performance is evaluated through simulations and the workflow is demonstrated on data from a human psoriasis study.
    Machine Learning practices and infrastructures. (arXiv:2307.06518v1 [cs.CY])
    Machine Learning (ML) systems, particularly when deployed in high-stakes domains, are deeply consequential. They can exacerbate existing inequities, create new modes of discrimination, and reify outdated social constructs. Accordingly, the social context (i.e. organisations, teams, cultures) in which ML systems are developed is a site of active research for the field of AI ethics, and intervention for policymakers. This paper focuses on one aspect of social context that is often overlooked: interactions between practitioners and the tools they rely on, and the role these interactions play in shaping ML practices and the development of ML systems. In particular, through an empirical study of questions asked on the Stack Exchange forums, the use of interactive computing platforms (e.g. Jupyter Notebook and Google Colab) in ML practices is explored. I find that interactive computing platforms are used in a host of learning and coordination practices, which constitutes an infrastructural relationship between interactive computing platforms and ML practitioners. I describe how ML practices are co-evolving alongside the development of interactive computing platforms, and highlight how this risks making invisible aspects of the ML life cycle that AI ethics researchers' have demonstrated to be particularly salient for the societal impact of deployed ML systems.
    MPR-Net:Multi-Scale Pattern Reproduction Guided Universality Time Series Interpretable Forecasting. (arXiv:2307.06736v1 [cs.LG])
    Time series forecasting has received wide interest from existing research due to its broad applications and inherent challenging. The research challenge lies in identifying effective patterns in historical series and applying them to future forecasting. Advanced models based on point-wise connected MLP and Transformer architectures have strong fitting power, but their secondary computational complexity limits practicality. Additionally, those structures inherently disrupt the temporal order, reducing the information utilization and making the forecasting process uninterpretable. To solve these problems, this paper proposes a forecasting model, MPR-Net. It first adaptively decomposes multi-scale historical series patterns using convolution operation, then constructs a pattern extension forecasting method based on the prior knowledge of pattern reproduction, and finally reconstructs future patterns into future series using deconvolution operation. By leveraging the temporal dependencies present in the time series, MPR-Net not only achieves linear time complexity, but also makes the forecasting process interpretable. By carrying out sufficient experiments on more than ten real data sets of both short and long term forecasting tasks, MPR-Net achieves the state of the art forecasting performance, as well as good generalization and robustness performance.
    A Novel Site-Agnostic Multimodal Deep Learning Model to Identify Pro-Eating Disorder Content on Social Media. (arXiv:2307.06775v1 [cs.LG])
    Over the last decade, there has been a vast increase in eating disorder diagnoses and eating disorder-attributed deaths, reaching their zenith during the Covid-19 pandemic. This immense growth derived in part from the stressors of the pandemic but also from increased exposure to social media, which is rife with content that promotes eating disorders. Such content can induce eating disorders in viewers. This study aimed to create a multimodal deep learning model capable of determining whether a given social media post promotes eating disorders based on a combination of visual and textual data. A labeled dataset of Tweets was collected from Twitter, upon which twelve deep learning models were trained and tested. Based on model performance, the most effective deep learning model was the multimodal fusion of the RoBERTa natural language processing model and the MaxViT image classification model, attaining accuracy and F1 scores of 95.9% and 0.959 respectively. The RoBERTa and MaxViT fusion model, deployed to classify an unlabeled dataset of posts from the social media sites Tumblr and Reddit, generated similar classifications as previous research studies that did not employ artificial intelligence, showing that artificial intelligence can develop insights congruent to those of researchers. Additionally, the model was used to conduct a time-series analysis of yet unseen Tweets from eight Twitter hashtags, uncovering that the relative abundance of pro-eating disorder content has decreased drastically. However, since approximately 2018, pro-eating disorder content has either stopped its decline or risen once more in ampleness.
    inTformer: A Time-Embedded Attention-Based Transformer for Crash Likelihood Prediction at Intersections Using Connected Vehicle Data. (arXiv:2307.03854v2 [cs.LG] UPDATED)
    The real-time crash likelihood prediction model is an essential component of the proactive traffic safety management system. Over the years, numerous studies have attempted to construct a crash likelihood prediction model in order to enhance traffic safety, but mostly on freeways. In the majority of the existing studies, researchers have primarily employed a deep learning-based framework to identify crash potential. Lately, Transformer has emerged as a potential deep neural network that fundamentally operates through attention-based mechanisms. Transformer has several functional benefits over extant deep learning models such as Long Short-Term Memory (LSTM), Convolution Neural Network (CNN), etc. Firstly, Transformer can readily handle long-term dependencies in a data sequence. Secondly, Transformers can parallelly process all elements in a data sequence during training. Finally, a Transformer does not have the vanishing gradient issue. Realizing the immense possibility of Transformers, this paper proposes inTersection-Transformer (inTformer), a time-embedded attention-based Transformer model that can effectively predict intersection crash likelihood in real-time. The proposed model was evaluated using connected vehicle data extracted from INRIX and Center for Advanced Transportation Technology (CATT) Lab's Signal Analytics Platform. The data was parallelly formatted and stacked at different timesteps to develop nine inTformer models. The best inTformer model achieved a sensitivity of 73%. This model was also compared to earlier studies on crash likelihood prediction at intersections and with several established deep learning models trained on the same connected vehicle dataset. In every scenario, this inTformer outperformed the benchmark models confirming the viability of the proposed inTformer architecture.
    An Improved Uniform Convergence Bound with Fat-Shattering Dimension. (arXiv:2307.06644v1 [cs.LG])
    The fat-shattering dimension characterizes the uniform convergence property of real-valued functions. The state-of-the-art upper bounds feature a multiplicative squared logarithmic factor on the sample complexity, leaving an open gap with the existing lower bound. We provide an improved uniform convergence bound that closes this gap.
    GRAN is superior to GraphRNN: node orderings, kernel- and graph embeddings-based metrics for graph generators. (arXiv:2307.06709v1 [cs.LG])
    A wide variety of generative models for graphs have been proposed. They are used in drug discovery, road networks, neural architecture search, and program synthesis. Generating graphs has theoretical challenges, such as isomorphic representations -- evaluating how well a generative model performs is difficult. Which model to choose depending on the application domain? We extensively study kernel-based metrics on distributions of graph invariants and manifold-based and kernel-based metrics in graph embedding space. Manifold-based metrics outperform kernel-based metrics in embedding space. We use these metrics to compare GraphRNN and GRAN, two well-known generative models for graphs, and unveil the influence of node orderings. It shows the superiority of GRAN over GraphRNN - further, our proposed adaptation of GraphRNN with a depth-first search ordering is effective for small-sized graphs. A guideline on good practices regarding dataset selection and node feature initialization is provided. Our work is accompanied by open-source code and reproducible experiments.
    Temporal Label-Refinement for Weakly-Supervised Audio-Visual Event Localization. (arXiv:2307.06385v1 [cs.CV])
    Audio-Visual Event Localization (AVEL) is the task of temporally localizing and classifying \emph{audio-visual events}, i.e., events simultaneously visible and audible in a video. In this paper, we solve AVEL in a weakly-supervised setting, where only video-level event labels (their presence/absence, but not their locations in time) are available as supervision for training. Our idea is to use a base model to estimate labels on the training data at a finer temporal resolution than at the video level and re-train the model with these labels. I.e., we determine the subset of labels for each \emph{slice} of frames in a training video by (i) replacing the frames outside the slice with those from a second video having no overlap in video-level labels, and (ii) feeding this synthetic video into the base model to extract labels for just the slice in question. To handle the out-of-distribution nature of our synthetic videos, we propose an auxiliary objective for the base model that induces more reliable predictions of the localized event labels as desired. Our three-stage pipeline outperforms several existing AVEL methods with no architectural changes and improves performance on a related weakly-supervised task as well.
    Tackling Combinatorial Distribution Shift: A Matrix Completion Perspective. (arXiv:2307.06457v1 [cs.LG])
    Obtaining rigorous statistical guarantees for generalization under distribution shift remains an open and active research area. We study a setting we call combinatorial distribution shift, where (a) under the test- and training-distributions, the labels $z$ are determined by pairs of features $(x,y)$, (b) the training distribution has coverage of certain marginal distributions over $x$ and $y$ separately, but (c) the test distribution involves examples from a product distribution over $(x,y)$ that is {not} covered by the training distribution. Focusing on the special case where the labels are given by bilinear embeddings into a Hilbert space $H$: $\mathbb{E}[z \mid x,y ]=\langle f_{\star}(x),g_{\star}(y)\rangle_{{H}}$, we aim to extrapolate to a test distribution domain that is $not$ covered in training, i.e., achieving bilinear combinatorial extrapolation. Our setting generalizes a special case of matrix completion from missing-not-at-random data, for which all existing results require the ground-truth matrices to be either exactly low-rank, or to exhibit very sharp spectral cutoffs. In this work, we develop a series of theoretical results that enable bilinear combinatorial extrapolation under gradual spectral decay as observed in typical high-dimensional data, including novel algorithms, generalization guarantees, and linear-algebraic results. A key tool is a novel perturbation bound for the rank-$k$ singular value decomposition approximations between two matrices that depends on the relative spectral gap rather than the absolute spectral gap, a result that may be of broader independent interest.
    Discovering How Agents Learn Using Few Data. (arXiv:2307.06640v1 [cs.GT])
    Decentralized learning algorithms are an essential tool for designing multi-agent systems, as they enable agents to autonomously learn from their experience and past interactions. In this work, we propose a theoretical and algorithmic framework for real-time identification of the learning dynamics that govern agent behavior using a short burst of a single system trajectory. Our method identifies agent dynamics through polynomial regression, where we compensate for limited data by incorporating side-information constraints that capture fundamental assumptions or expectations about agent behavior. These constraints are enforced computationally using sum-of-squares optimization, leading to a hierarchy of increasingly better approximations of the true agent dynamics. Extensive experiments demonstrated that our approach, using only 5 samples from a short run of a single trajectory, accurately recovers the true dynamics across various benchmarks, including equilibrium selection and prediction of chaotic systems up to 10 Lyapunov times. These findings suggest that our approach has significant potential to support effective policy and decision-making in strategic multi-agent systems.
    Quantum Autoencoders for Learning Quantum Channel Codes. (arXiv:2307.06622v1 [quant-ph])
    This work investigates the application of quantum machine learning techniques for classical and quantum communication across different qubit channel models. By employing parameterized quantum circuits and a flexible channel noise model, we develop a machine learning framework to generate quantum channel codes and evaluate their effectiveness. We explore classical, entanglement-assisted, and quantum communication scenarios within our framework. Applying it to various quantum channel models as proof of concept, we demonstrate strong performance in each case. Our results highlight the potential of quantum machine learning in advancing research on quantum communication systems, enabling a better understanding of capacity bounds under modulation constraints, various communication settings, and diverse channel models.
    Data Augmentation is a Hyperparameter: Cherry-picked Self-Supervision for Unsupervised Anomaly Detection is Creating the Illusion of Success. (arXiv:2208.07734v5 [cs.LG] UPDATED)
    Self-supervised learning (SSL) has emerged as a promising alternative to create supervisory signals to real-world problems, avoiding the extensive cost of manual labeling. SSL is particularly attractive for unsupervised tasks such as anomaly detection (AD), where labeled anomalies are rare or often nonexistent. A large catalog of augmentation functions has been used for SSL-based AD (SSAD) on image data, and recent works have reported that the type of augmentation has a significant impact on accuracy. Motivated by those, this work sets out to put image-based SSAD under a larger lens and investigate the role of data augmentation in SSAD. Through extensive experiments on 3 different detector models and across 420 AD tasks, we provide comprehensive numerical and visual evidences that the alignment between data augmentation and anomaly-generating mechanism is the key to the success of SSAD, and in the lack thereof, SSL may even impair accuracy. To the best of our knowledge, this is the first meta-analysis on the role of data augmentation in SSAD.
    DRAGON: A Dialogue-Based Robot for Assistive Navigation with Visual Language Grounding. (arXiv:2307.06924v1 [cs.RO])
    Persons with visual impairments (PwVI) have difficulties understanding and navigating spaces around them. Current wayfinding technologies either focus solely on navigation or provide limited communication about the environment. Motivated by recent advances in visual-language grounding and semantic navigation, we propose DRAGON, a guiding robot powered by a dialogue system and the ability to associate the environment with natural language. By understanding the commands from the user, DRAGON is able to guide the user to the desired landmarks on the map, describe the environment, and answer questions from visual observations. Through effective utilization of dialogue, the robot can ground the user's free-form descriptions to landmarks in the environment, and give the user semantic information through spoken language. We conduct a user study with blindfolded participants in an everyday indoor environment. Our results demonstrate that DRAGON is able to communicate with the user smoothly, provide a good guiding experience, and connect users with their surrounding environment in an intuitive manner.
    Sequential Experimental Design for X-Ray CT Using Deep Reinforcement Learning. (arXiv:2307.06343v1 [eess.IV])
    In X-ray Computed Tomography (CT), projections from many angles are acquired and used for 3D reconstruction. To make CT suitable for in-line quality control, reducing the number of angles while maintaining reconstruction quality is necessary. Sparse-angle tomography is a popular approach for obtaining 3D reconstructions from limited data. To optimize its performance, one can adapt scan angles sequentially to select the most informative angles for each scanned object. Mathematically, this corresponds to solving and optimal experimental design (OED) problem. OED problems are high-dimensional, non-convex, bi-level optimization problems that cannot be solved online, i.e., during the scan. To address these challenges, we pose the OED problem as a partially observable Markov decision process in a Bayesian framework, and solve it through deep reinforcement learning. The approach learns efficient non-greedy policies to solve a given class of OED problems through extensive offline training rather than solving a given OED problem directly via numerical optimization. As such, the trained policy can successfully find the most informative scan angles online. We use a policy training method based on the Actor-Critic approach and evaluate its performance on 2D tomography with synthetic data.
    Multivariate Time Series characterization and forecasting of VoIP traffic in real mobile networks. (arXiv:2307.06645v1 [cs.NI])
    Predicting the behavior of real-time traffic (e.g., VoIP) in mobility scenarios could help the operators to better plan their network infrastructures and to optimize the allocation of resources. Accordingly, in this work the authors propose a forecasting analysis of crucial QoS/QoE descriptors (some of which neglected in the technical literature) of VoIP traffic in a real mobile environment. The problem is formulated in terms of a multivariate time series analysis. Such a formalization allows to discover and model the temporal relationships among various descriptors and to forecast their behaviors for future periods. Techniques such as Vector Autoregressive models and machine learning (deep-based and tree-based) approaches are employed and compared in terms of performance and time complexity, by reframing the multivariate time series problem into a supervised learning one. Moreover, a series of auxiliary analyses (stationarity, orthogonal impulse responses, etc.) are performed to discover the analytical structure of the time series and to provide deep insights about their relationships. The whole theoretical analysis has an experimental counterpart since a set of trials across a real-world LTE-Advanced environment has been performed to collect, post-process and analyze about 600,000 voice packets, organized per flow and differentiated per codec.
    Assessment of the suitability of degradation models for the planning of CCTV inspections of sewer pipes. (arXiv:2307.06341v1 [cs.LG])
    The degradation of sewer pipes poses significant economical, environmental and health concerns. The maintenance of such assets requires structured plans to perform inspections, which are more efficient when structural and environmental features are considered along with the results of previous inspection reports. The development of such plans requires degradation models that can be based on statistical and machine learning methods. This work proposes a methodology to assess their suitability to plan inspections considering three dimensions: accuracy metrics, ability to produce long-term degradation curves and explainability. Results suggest that although ensemble models yield the highest accuracy, they are unable to infer the long-term degradation of the pipes, whereas the Logistic Regression offers a slightly less accurate model that is able to produce consistent degradation curves with a high explainability. A use case is presented to demonstrate this methodology and the efficiency of model-based planning compared to the current inspection plan.
    Stochastic Delay Differential Games: Financial Modeling and Machine Learning Algorithms. (arXiv:2307.06450v1 [math.OC])
    In this paper, we propose a numerical methodology for finding the closed-loop Nash equilibrium of stochastic delay differential games through deep learning. These games are prevalent in finance and economics where multi-agent interaction and delayed effects are often desired features in a model, but are introduced at the expense of increased dimensionality of the problem. This increased dimensionality is especially significant as that arising from the number of players is coupled with the potential infinite dimensionality caused by the delay. Our approach involves parameterizing the controls of each player using distinct recurrent neural networks. These recurrent neural network-based controls are then trained using a modified version of Brown's fictitious play, incorporating deep learning techniques. To evaluate the effectiveness of our methodology, we test it on finance-related problems with known solutions. Furthermore, we also develop new problems and derive their analytical Nash equilibrium solutions, which serve as additional benchmarks for assessing the performance of our proposed deep learning approach.  ( 2 min )
    Trainability, Expressivity and Interpretability in Gated Neural ODEs. (arXiv:2307.06398v1 [cs.LG])
    Understanding how the dynamics in biological and artificial neural networks implement the computations required for a task is a salient open question in machine learning and neuroscience. In particular, computations requiring complex memory storage and retrieval pose a significant challenge for these networks to implement or learn. Recently, a family of models described by neural ordinary differential equations (nODEs) has emerged as powerful dynamical neural network models capable of capturing complex dynamics. Here, we extend nODEs by endowing them with adaptive timescales using gating interactions. We refer to these as gated neural ODEs (gnODEs). Using a task that requires memory of continuous quantities, we demonstrate the inductive bias of the gnODEs to learn (approximate) continuous attractors. We further show how reduced-dimensional gnODEs retain their modeling power while greatly improving interpretability, even allowing explicit visualization of the structure of learned attractors. We introduce a novel measure of expressivity which probes the capacity of a neural network to generate complex trajectories. Using this measure, we explore how the phase-space dimension of the nODEs and the complexity of the function modeling the flow field contribute to expressivity. We see that a more complex function for modeling the flow field allows a lower-dimensional nODE to capture a given target dynamics. Finally, we demonstrate the benefit of gating in nODEs on several real-world tasks.  ( 2 min )
    Artificial Intelligence for Drug Discovery: Are We There Yet?. (arXiv:2307.06521v1 [cs.AI])
    Drug discovery is adapting to novel technologies such as data science, informatics, and artificial intelligence (AI) to accelerate effective treatment development while reducing costs and animal experiments. AI is transforming drug discovery, as indicated by increasing interest from investors, industrial and academic scientists, and legislators. Successful drug discovery requires optimizing properties related to pharmacodynamics, pharmacokinetics, and clinical outcomes. This review discusses the use of AI in the three pillars of drug discovery: diseases, targets, and therapeutic modalities, with a focus on small molecule drugs. AI technologies, such as generative chemistry, machine learning, and multi-property optimization, have enabled several compounds to enter clinical trials. The scientific community must carefully vet known information to address the reproducibility crisis. The full potential of AI in drug discovery can only be realized with sufficient ground truth and appropriate human intervention at later pipeline stages.
    Metal Oxide-based Gas Sensor Array for the VOCs Analysis in Complex Mixtures using Machine Learning. (arXiv:2307.06556v1 [physics.app-ph])
    Detection of Volatile Organic Compounds (VOCs) from the breath is becoming a viable route for the early detection of diseases non-invasively. This paper presents a sensor array with three metal oxide electrodes that can use machine learning methods to identify four distinct VOCs in a mixture. The metal oxide sensor array was subjected to various VOC concentrations, including ethanol, acetone, toluene and chloroform. The dataset obtained from individual gases and their mixtures were analyzed using multiple machine learning algorithms, such as Random Forest (RF), K-Nearest Neighbor (KNN), Decision Tree, Linear Regression, Logistic Regression, Naive Bayes, Linear Discriminant Analysis, Artificial Neural Network, and Support Vector Machine. KNN and RF have shown more than 99% accuracy in classifying different varying chemicals in the gas mixtures. In regression analysis, KNN has delivered the best results with R2 value of more than 0.99 and LOD of 0.012, 0.015, 0.014 and 0.025 PPM for predicting the concentrations of varying chemicals Acetone, Toluene, Ethanol, and Chloroform, respectively in complex mixtures. Therefore, it is demonstrated that the array utilizing the provided algorithms can classify and predict the concentrations of the four gases simultaneously for disease diagnosis and treatment monitoring.  ( 2 min )
    Microbial Genetic Algorithm-based Black-box Attack against Interpretable Deep Learning Systems. (arXiv:2307.06496v1 [cs.CV])
    Deep learning models are susceptible to adversarial samples in white and black-box environments. Although previous studies have shown high attack success rates, coupling DNN models with interpretation models could offer a sense of security when a human expert is involved, who can identify whether a given sample is benign or malicious. However, in white-box environments, interpretable deep learning systems (IDLSes) have been shown to be vulnerable to malicious manipulations. In black-box settings, as access to the components of IDLSes is limited, it becomes more challenging for the adversary to fool the system. In this work, we propose a Query-efficient Score-based black-box attack against IDLSes, QuScore, which requires no knowledge of the target model and its coupled interpretation model. QuScore is based on transfer-based and score-based methods by employing an effective microbial genetic algorithm. Our method is designed to reduce the number of queries necessary to carry out successful attacks, resulting in a more efficient process. By continuously refining the adversarial samples created based on feedback scores from the IDLS, our approach effectively navigates the search space to identify perturbations that can fool the system. We evaluate the attack's effectiveness on four CNN models (Inception, ResNet, VGG, DenseNet) and two interpretation models (CAM, Grad), using both ImageNet and CIFAR datasets. Our results show that the proposed approach is query-efficient with a high attack success rate that can reach between 95% and 100% and transferability with an average success rate of 69% in the ImageNet and CIFAR datasets. Our attack method generates adversarial examples with attribution maps that resemble benign samples. We have also demonstrated that our attack is resilient against various preprocessing defense techniques and can easily be transferred to different DNN models.  ( 3 min )
    Convergence of Message Passing Graph Neural Networks with Generic Aggregation On Large Random Graphs. (arXiv:2304.11140v2 [stat.ML] UPDATED)
    We study the convergence of message passing graph neural networks on random graph models to their continuous counterpart as the number of nodes tends to infinity. Until now, this convergence was only known for architectures with aggregation functions in the form of normalized means, or, equivalently, of an application of classical operators like the adjacency matrix or the graph Laplacian. We extend such results to a large class of aggregation functions, that encompasses all classically used message passing graph neural networks, such as attention-based message passing, max convolutional message passing or (degree-normalized) convolutional message passing. Under mild assumptions, we give non-asymptotic bounds with high probability to quantify this convergence. Our main result is based on the McDiarmid inequality. Interestingly, this result does not apply to the case where the aggregation is a coordinate-wise maximum. We treat this case separately and obtain a different convergence rate.
    Deep Network Approximation: Beyond ReLU to Diverse Activation Functions. (arXiv:2307.06555v1 [cs.LG])
    This paper explores the expressive power of deep neural networks for a diverse range of activation functions. An activation function set $\mathscr{A}$ is defined to encompass the majority of commonly used activation functions, such as $\mathtt{ReLU}$, $\mathtt{LeakyReLU}$, $\mathtt{ReLU}^2$, $\mathtt{ELU}$, $\mathtt{SELU}$, $\mathtt{Softplus}$, $\mathtt{GELU}$, $\mathtt{SiLU}$, $\mathtt{Swish}$, $\mathtt{Mish}$, $\mathtt{Sigmoid}$, $\mathtt{Tanh}$, $\mathtt{Arctan}$, $\mathtt{Softsign}$, $\mathtt{dSiLU}$, and $\mathtt{SRS}$. We demonstrate that for any activation function $\varrho\in \mathscr{A}$, a $\mathtt{ReLU}$ network of width $N$ and depth $L$ can be approximated to arbitrary precision by a $\varrho$-activated network of width $6N$ and depth $2L$ on any bounded set. This finding enables the extension of most approximation results achieved with $\mathtt{ReLU}$ networks to a wide variety of other activation functions, at the cost of slightly larger constants.  ( 2 min )
    Convolutional Neural Networks for Sentiment Analysis on Weibo Data: A Natural Language Processing Approach. (arXiv:2307.06540v1 [cs.CL])
    This study addressed the complex task of sentiment analysis on a dataset of 119,988 original tweets from Weibo using a Convolutional Neural Network (CNN), offering a new approach to Natural Language Processing (NLP). The data, sourced from Baidu's PaddlePaddle AI platform, were meticulously preprocessed, tokenized, and categorized based on sentiment labels. A CNN-based model was utilized, leveraging word embeddings for feature extraction, and trained to perform sentiment classification. The model achieved a macro-average F1-score of approximately 0.73 on the test set, showing balanced performance across positive, neutral, and negative sentiments. The findings underscore the effectiveness of CNNs for sentiment analysis tasks, with implications for practical applications in social media analysis, market research, and policy studies. The complete experimental content and code have been made publicly available on the Kaggle data platform for further research and development. Future work may involve exploring different architectures, such as Recurrent Neural Networks (RNN) or transformers, or using more complex pre-trained models like BERT, to further improve the model's ability to understand linguistic nuances and context.  ( 2 min )
    Introducing Foundation Models as Surrogate Models: Advancing Towards More Practical Adversarial Attacks. (arXiv:2307.06608v1 [cs.LG])
    Recently, the no-box adversarial attack, in which the attacker lacks access to the model's architecture, weights, and training data, become the most practical and challenging attack setup. However, there is an unawareness of the potential and flexibility inherent in the surrogate model selection process on no-box setting. Inspired by the burgeoning interest in utilizing foundational models to address downstream tasks, this paper adopts an innovative idea that 1) recasting adversarial attack as a downstream task. Specifically, image noise generation to meet the emerging trend and 2) introducing foundational models as surrogate models. Harnessing the concept of non-robust features, we elaborate on two guiding principles for surrogate model selection to explain why the foundational model is an optimal choice for this role. However, paradoxically, we observe that these foundational models underperform. Analyzing this unexpected behavior within the feature space, we attribute the lackluster performance of foundational models (e.g., CLIP) to their significant representational capacity and, conversely, their lack of discriminative prowess. To mitigate this issue, we propose the use of a margin-based loss strategy for the fine-tuning of foundational models on target images. The experimental results verify that our approach, which employs the basic Fast Gradient Sign Method (FGSM) attack algorithm, outstrips the performance of other, more convoluted algorithms. We conclude by advocating for the research community to consider surrogate models as crucial determinants in the effectiveness of adversarial attacks in no-box settings. The implications of our work bear relevance for improving the efficacy of such adversarial attacks and the overall robustness of AI systems.  ( 3 min )
    Equalization in Dispersion-Managed Systems Using Learned Digital Back-Propagation. (arXiv:2307.06821v1 [cs.NI])
    In this paper, we investigate the use of the learned digital back-propagation (LDBP) for equalizing dual-polarization fiber-optic transmission in dispersion-managed (DM) links. LDBP is a deep neural network that optimizes the parameters of DBP using the stochastic gradient descent. We evaluate DBP and LDBP in a simulated WDM dual-polarization fiber transmission system operating at the bitrate of 256 Gbit/s per channel, with a dispersion map designed for a 2016 km link with 15% residual dispersion. Our results show that in single-channel transmission, LDBP achieves an effective signal-to-noise ratio improvement of 6.3 dB and 2.5 dB, respectively, over linear equalization and DBP. In WDM transmission, the corresponding $Q$-factor gains are 1.1 dB and 0.4 dB, respectively. Additionally, we conduct a complexity analysis, which reveals that a frequency-domain implementation of LDBP and DBP is more favorable in terms of complexity than the time-domain implementation. These findings demonstrate the effectiveness of LDBP in mitigating the nonlinear effects in DM fiber-optic transmission systems.
    Misclassification in Automated Content Analysis Causes Bias in Regression. Can We Fix It? Yes We Can!. (arXiv:2307.06483v1 [cs.LG])
    Automated classifiers (ACs), often built via supervised machine learning (SML), can categorize large, statistically powerful samples of data ranging from text to images and video, and have become widely popular measurement devices in communication science and related fields. Despite this popularity, even highly accurate classifiers make errors that cause misclassification bias and misleading results in downstream analyses-unless such analyses account for these errors. As we show in a systematic literature review of SML applications, communication scholars largely ignore misclassification bias. In principle, existing statistical methods can use "gold standard" validation data, such as that created by human annotators, to correct misclassification bias and produce consistent estimates. We introduce and test such methods, including a new method we design and implement in the R package misclassificationmodels, via Monte Carlo simulations designed to reveal each method's limitations, which we also release. Based on our results, we recommend our new error correction method as it is versatile and efficient. In sum, automated classifiers, even those below common accuracy standards or making systematic misclassifications, can be useful for measurement with careful study design and appropriate error correction methods.
    Energy Discrepancies: A Score-Independent Loss for Energy-Based Models. (arXiv:2307.06431v1 [stat.ML])
    Energy-based models are a simple yet powerful class of probabilistic models, but their widespread adoption has been limited by the computational burden of training them. We propose a novel loss function called Energy Discrepancy (ED) which does not rely on the computation of scores or expensive Markov chain Monte Carlo. We show that ED approaches the explicit score matching and negative log-likelihood loss under different limits, effectively interpolating between both. Consequently, minimum ED estimation overcomes the problem of nearsightedness encountered in score-based estimation methods, while also enjoying theoretical guarantees. Through numerical experiments, we demonstrate that ED learns low-dimensional data distributions faster and more accurately than explicit score matching or contrastive divergence. For high-dimensional image data, we describe how the manifold hypothesis puts limitations on our approach and demonstrate the effectiveness of energy discrepancy by training the energy-based model as a prior of a variational decoder model.  ( 2 min )
    Leveraging Contextual Counterfactuals Toward Belief Calibration. (arXiv:2307.06513v1 [cs.AI])
    Beliefs and values are increasingly being incorporated into our AI systems through alignment processes, such as carefully curating data collection principles or regularizing the loss function used for training. However, the meta-alignment problem is that these human beliefs are diverse and not aligned across populations; furthermore, the implicit strength of each belief may not be well calibrated even among humans, especially when trying to generalize across contexts. Specifically, in high regret situations, we observe that contextual counterfactuals and recourse costs are particularly important in updating a decision maker's beliefs and the strengths to which such beliefs are held. Therefore, we argue that including counterfactuals is key to an accurate calibration of beliefs during alignment. To do this, we first segment belief diversity into two categories: subjectivity (across individuals within a population) and epistemic uncertainty (within an individual across different contexts). By leveraging our notion of epistemic uncertainty, we introduce `the belief calibration cycle' framework to more holistically calibrate this diversity of beliefs with context-driven counterfactual reasoning by using a multi-objective optimization. We empirically apply our framework for finding a Pareto frontier of clustered optimal belief strengths that generalize across different contexts, demonstrating its efficacy on a toy dataset for credit decisions.
    On the Effective Horizon of Inverse Reinforcement Learning. (arXiv:2307.06541v1 [cs.LG])
    Inverse reinforcement learning (IRL) algorithms often rely on (forward) reinforcement learning or planning over a given time horizon to compute an approximately optimal policy for a hypothesized reward function and then match this policy with expert demonstrations. The time horizon plays a critical role in determining both the accuracy of reward estimate and the computational efficiency of IRL algorithms. Interestingly, an effective time horizon shorter than the ground-truth value often produces better results faster. This work formally analyzes this phenomenon and provides an explanation: the time horizon controls the complexity of an induced policy class and mitigates overfitting with limited data. This analysis leads to a principled choice of the effective horizon for IRL. It also prompts us to reexamine the classic IRL formulation: it is more natural to learn jointly the reward and the effective horizon together rather than the reward alone with a given horizon. Our experimental results confirm the theoretical analysis.  ( 2 min )
    Online Distributed Learning with Quantized Finite-Time Coordination. (arXiv:2307.06620v1 [cs.LG])
    In this paper we consider online distributed learning problems. Online distributed learning refers to the process of training learning models on distributed data sources. In our setting a set of agents need to cooperatively train a learning model from streaming data. Differently from federated learning, the proposed approach does not rely on a central server but only on peer-to-peer communications among the agents. This approach is often used in scenarios where data cannot be moved to a centralized location due to privacy, security, or cost reasons. In order to overcome the absence of a central server, we propose a distributed algorithm that relies on a quantized, finite-time coordination protocol to aggregate the locally trained models. Furthermore, our algorithm allows for the use of stochastic gradients during local training. Stochastic gradients are computed using a randomly sampled subset of the local training data, which makes the proposed algorithm more efficient and scalable than traditional gradient descent. In our paper, we analyze the performance of the proposed algorithm in terms of the mean distance from the online solution. Finally, we present numerical results for a logistic regression task.  ( 2 min )
    Deep Neural Networks for Semiparametric Frailty Models via H-likelihood. (arXiv:2307.06581v1 [stat.ML])
    For prediction of clustered time-to-event data, we propose a new deep neural network based gamma frailty model (DNN-FM). An advantage of the proposed model is that the joint maximization of the new h-likelihood provides maximum likelihood estimators for fixed parameters and best unbiased predictors for random frailties. Thus, the proposed DNN-FM is trained by using a negative profiled h-likelihood as a loss function, constructed by profiling out the non-parametric baseline hazard. Experimental studies show that the proposed method enhances the prediction performance of the existing methods. A real data analysis shows that the inclusion of subject-specific frailties helps to improve prediction of the DNN based Cox model (DNN-Cox).  ( 2 min )
    Prescriptive Process Monitoring Under Resource Constraints: A Reinforcement Learning Approach. (arXiv:2307.06564v1 [cs.AI])
    Prescriptive process monitoring methods seek to optimize the performance of business processes by triggering interventions at runtime, thereby increasing the probability of positive case outcomes. These interventions are triggered according to an intervention policy. Reinforcement learning has been put forward as an approach to learning intervention policies through trial and error. Existing approaches in this space assume that the number of resources available to perform interventions in a process is unlimited, an unrealistic assumption in practice. This paper argues that, in the presence of resource constraints, a key dilemma in the field of prescriptive process monitoring is to trigger interventions based not only on predictions of their necessity, timeliness, or effect but also on the uncertainty of these predictions and the level of resource utilization. Indeed, committing scarce resources to an intervention when the necessity or effects of this intervention are highly uncertain may intuitively lead to suboptimal intervention effects. Accordingly, the paper proposes a reinforcement learning approach for prescriptive process monitoring that leverages conformal prediction techniques to consider the uncertainty of the predictions upon which an intervention decision is based. An evaluation using real-life datasets demonstrates that explicitly modeling uncertainty using conformal predictions helps reinforcement learning agents converge towards policies with higher net intervention gain  ( 2 min )
    No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models. (arXiv:2307.06440v1 [cs.LG])
    The computation necessary for training Transformer-based language models has skyrocketed in recent years. This trend has motivated research on efficient training algorithms designed to improve training, validation, and downstream performance faster than standard training. In this work, we revisit three categories of such algorithms: dynamic architectures (layer stacking, layer dropping), batch selection (selective backprop, RHO loss), and efficient optimizers (Lion, Sophia). When pre-training BERT and T5 with a fixed computation budget using such methods, we find that their training, validation, and downstream gains vanish compared to a baseline with a fully-decayed learning rate. We define an evaluation protocol that enables computation to be done on arbitrary machines by mapping all computation time to a reference machine which we call reference system time. We discuss the limitations of our proposed protocol and release our code to encourage rigorous research in efficient training procedures: https://github.com/JeanKaddour/NoTrainNoGain.  ( 2 min )
    Tensor Decompositions Meet Control Theory: Learning General Mixtures of Linear Dynamical Systems. (arXiv:2307.06538v1 [cs.LG])
    Recently Chen and Poor initiated the study of learning mixtures of linear dynamical systems. While linear dynamical systems already have wide-ranging applications in modeling time-series data, using mixture models can lead to a better fit or even a richer understanding of underlying subpopulations represented in the data. In this work we give a new approach to learning mixtures of linear dynamical systems that is based on tensor decompositions. As a result, our algorithm succeeds without strong separation conditions on the components, and can be used to compete with the Bayes optimal clustering of the trajectories. Moreover our algorithm works in the challenging partially-observed setting. Our starting point is the simple but powerful observation that the classic Ho-Kalman algorithm is a close relative of modern tensor decomposition methods for learning latent variable models. This gives us a playbook for how to extend it to work with more complicated generative models.  ( 2 min )
    DSV: An Alignment Validation Loss for Self-supervised Outlier Model Selection. (arXiv:2307.06534v1 [cs.LG])
    Self-supervised learning (SSL) has proven effective in solving various problems by generating internal supervisory signals. Unsupervised anomaly detection, which faces the high cost of obtaining true labels, is an area that can greatly benefit from SSL. However, recent literature suggests that tuning the hyperparameters (HP) of data augmentation functions is crucial to the success of SSL-based anomaly detection (SSAD), yet a systematic method for doing so remains unknown. In this work, we propose DSV (Discordance and Separability Validation), an unsupervised validation loss to select high-performing detection models with effective augmentation HPs. DSV captures the alignment between an augmentation function and the anomaly-generating mechanism with surrogate losses, which approximate the discordance and separability of test data, respectively. As a result, the evaluation via DSV leads to selecting an effective SSAD model exhibiting better alignment, which results in high detection accuracy. We theoretically derive the degree of approximation conducted by the surrogate losses and empirically show that DSV outperforms a wide range of baselines on 21 real-world tasks.  ( 2 min )
    Ageing Analysis of Embedded SRAM on a Large-Scale Testbed Using Machine Learning. (arXiv:2307.06693v1 [cs.AR])
    Ageing detection and failure prediction are essential in many Internet of Things (IoT) deployments, which operate huge quantities of embedded devices unattended in the field for years. In this paper, we present a large-scale empirical analysis of natural SRAM wear-out using 154 boards from a general-purpose testbed. Starting from SRAM initialization bias, which each node can easily collect at startup, we apply various metrics for feature extraction and experiment with common machine learning methods to predict the age of operation for this node. Our findings indicate that even though ageing impacts are subtle, our indicators can well estimate usage times with an $R^2$ score of 0.77 and a mean error of 24% using regressors, and with an F1 score above 0.6 for classifiers applying a six-months resolution.
    IntelliGraphs: Datasets for Benchmarking Knowledge Graph Generation. (arXiv:2307.06698v1 [cs.AI])
    Knowledge Graph Embedding (KGE) models are used to learn continuous representations of entities and relations. A key task in the literature is predicting missing links between entities. However, Knowledge Graphs are not just sets of links but also have semantics underlying their structure. Semantics is crucial in several downstream tasks, such as query answering or reasoning. We introduce the subgraph inference task, where a model has to generate likely and semantically valid subgraphs. We propose IntelliGraphs, a set of five new Knowledge Graph datasets. The IntelliGraphs datasets contain subgraphs with semantics expressed in logical rules for evaluating subgraph inference. We also present the dataset generator that produced the synthetic datasets. We designed four novel baseline models, which include three models based on traditional KGEs. We evaluate their expressiveness and show that these models cannot capture the semantics. We believe this benchmark will encourage the development of machine learning models that emphasize semantic understanding.
    Bregman Deviations of Generic Exponential Families. (arXiv:2201.07306v4 [cs.LG] UPDATED)
    We revisit the method of mixture technique, also known as the Laplace method, to study the concentration phenomenon in generic exponential families. Combining the properties of Bregman divergence associated with log-partition function of the family with the method of mixtures for super-martingales, we establish a generic bound controlling the Bregman divergence between the parameter of the family and a finite sample estimate of the parameter. Our bound is time-uniform and makes appear a quantity extending the classical information gain to exponential families, which we call the Bregman information gain. For the practitioner, we instantiate this novel bound to several classical families, e.g., Gaussian, Bernoulli, Exponential, Weibull, Pareto, Poisson and Chi-square yielding explicit forms of the confidence sets and the Bregman information gain. We further numerically compare the resulting confidence bounds to state-of-the-art alternatives for time-uniform concentration and show that this novel method yields competitive results. Finally, we highlight the benefit of our concentration bounds on some illustrative applications.
    Spectral-Bias and Kernel-Task Alignment in Physically Informed Neural Networks. (arXiv:2307.06362v1 [stat.ML])
    Physically informed neural networks (PINNs) are a promising emerging method for solving differential equations. As in many other deep learning approaches, the choice of PINN design and training protocol requires careful craftsmanship. Here, we suggest a comprehensive theoretical framework that sheds light on this important problem. Leveraging an equivalence between infinitely over-parameterized neural networks and Gaussian process regression (GPR), we derive an integro-differential equation that governs PINN prediction in the large data-set limit -- the Neurally-Informed Equation (NIE). This equation augments the original one by a kernel term reflecting architecture choices and allows quantifying implicit bias induced by the network via a spectral decomposition of the source term in the original differential equation.
    Incomplete Utterance Rewriting as Sequential Greedy Tagging. (arXiv:2307.06337v1 [cs.LG])
    The task of incomplete utterance rewriting has recently gotten much attention. Previous models struggled to extract information from the dialogue context, as evidenced by the low restoration scores. To address this issue, we propose a novel sequence tagging-based model, which is more adept at extracting information from context. Meanwhile, we introduce speaker-aware embedding to model speaker variation. Experiments on multiple public datasets show that our model achieves optimal results on all nine restoration scores while having other metric scores comparable to previous state-of-the-art models. Furthermore, benefitting from the model's simplicity, our approach outperforms most previous models on inference speed.
  • Open

    balance -- a Python package for balancing biased data samples. (arXiv:2307.06024v2 [stat.CO] UPDATED)
    Surveys are an important research tool, providing unique measurements on subjective experiences such as sentiment and opinions that cannot be measured by other means. However, because survey data is collected from a self-selected group of participants, directly inferring insights from it to a population of interest, or training ML models on such data, can lead to erroneous estimates or under-performing models. In this paper we present balance, an open-source Python package by Meta, offering a simple workflow for analyzing and adjusting biased data samples with respect to a population of interest. The balance workflow includes three steps: understanding the initial bias in the data relative to a target we would like to infer, adjusting the data to correct for the bias by producing weights for each unit in the sample based on propensity scores, and evaluating the final biases and the variance inflation after applying the fitted weights. The package provides a simple API that can be used by researchers and data scientists from a wide range of fields on a variety of data. The paper provides the relevant context, methodological background, and presents the package's API.
    Convergence of Message Passing Graph Neural Networks with Generic Aggregation On Large Random Graphs. (arXiv:2304.11140v2 [stat.ML] UPDATED)
    We study the convergence of message passing graph neural networks on random graph models to their continuous counterpart as the number of nodes tends to infinity. Until now, this convergence was only known for architectures with aggregation functions in the form of normalized means, or, equivalently, of an application of classical operators like the adjacency matrix or the graph Laplacian. We extend such results to a large class of aggregation functions, that encompasses all classically used message passing graph neural networks, such as attention-based message passing, max convolutional message passing or (degree-normalized) convolutional message passing. Under mild assumptions, we give non-asymptotic bounds with high probability to quantify this convergence. Our main result is based on the McDiarmid inequality. Interestingly, this result does not apply to the case where the aggregation is a coordinate-wise maximum. We treat this case separately and obtain a different convergence rate.
    Tensor Completion Made Practical. (arXiv:2006.03134v2 [cs.DS] CROSS LISTED)
    Tensor completion is a natural higher-order generalization of matrix completion where the goal is to recover a low-rank tensor from sparse observations of its entries. Existing algorithms are either heuristic without provable guarantees, based on solving large semidefinite programs which are impractical to run, or make strong assumptions such as requiring the factors to be nearly orthogonal. In this paper we introduce a new variant of alternating minimization, which in turn is inspired by understanding how the progress measures that guide convergence of alternating minimization in the matrix setting need to be adapted to the tensor setting. We show strong provable guarantees, including showing that our algorithm converges linearly to the true tensors even when the factors are highly correlated and can be implemented in nearly linear time. Moreover our algorithm is also highly practical and we show that we can complete third order tensors with a thousand dimensions from observing a tiny fraction of its entries. In contrast, and somewhat surprisingly, we show that the standard version of alternating minimization, without our new twist, can converge at a drastically slower rate in practice.
    Emergent Neural Network Mechanisms for Generalization to Objects in Novel Orientations. (arXiv:2109.13445v2 [cs.CV] UPDATED)
    The capability of Deep Neural Networks (DNNs) to recognize objects in orientations outside the distribution of the training data is not well understood. We present evidence that DNNs are capable of generalizing to objects in novel orientations by disseminating orientation-invariance obtained from familiar objects seen from many viewpoints. This capability strengthens when training the DNN with an increasing number of familiar objects, but only in orientations that involve 2D rotations of familiar orientations. We show that this dissemination is achieved via neurons tuned to common features between familiar and unfamiliar objects. These results implicate brain-like neural mechanisms for generalization.
    Bayesian taut splines for estimating the number of modes. (arXiv:2307.05825v1 [stat.ME] CROSS LISTED)
    The number of modes in a probability density function is representative of the model's complexity and can also be viewed as the number of existing subpopulations. Despite its relevance, little research has been devoted to its estimation. Focusing on the univariate setting, we propose a novel approach targeting prediction accuracy inspired by some overlooked aspects of the problem. We argue for the need for structure in the solutions, the subjective and uncertain nature of modes, and the convenience of a holistic view blending global and local density properties. Our method builds upon a combination of flexible kernel estimators and parsimonious compositional splines. Feature exploration, model selection and mode testing are implemented in the Bayesian inference paradigm, providing soft solutions and allowing to incorporate expert judgement in the process. The usefulness of our proposal is illustrated through a case study in sports analytics, showcasing multiple companion visualisation tools. A thorough simulation study demonstrates that traditional modality-driven approaches paradoxically struggle to provide accurate results. In this context, our method emerges as a top-tier alternative offering innovative solutions for analysts.
    Towards Learning to Imitate from a Single Video Demonstration. (arXiv:1901.07186v4 [cs.LG] UPDATED)
    Agents that can learn to imitate given video observation -- \emph{without direct access to state or action information} are more applicable to learning in the natural world. However, formulating a reinforcement learning (RL) agent that facilitates this goal remains a significant challenge. We approach this challenge using contrastive training to learn a reward function comparing an agent's behaviour with a single demonstration. We use a Siamese recurrent neural network architecture to learn rewards in space and time between motion clips while training an RL policy to minimize this distance. Through experimentation, we also find that the inclusion of multi-task data and additional image encoding losses improve the temporal consistency of the learned rewards and, as a result, significantly improves policy learning. We demonstrate our approach on simulated humanoid, dog, and raptor agents in 2D and a quadruped and a humanoid in 3D. We show that our method outperforms current state-of-the-art techniques in these environments and can learn to imitate from a single video demonstration.
    Accelerated stochastic approximation with state-dependent noise. (arXiv:2307.01497v2 [math.OC] UPDATED)
    We consider a class of stochastic smooth convex optimization problems under rather general assumptions on the noise in the stochastic gradient observation. As opposed to the classical problem setting in which the variance of noise is assumed to be uniformly bounded, herein we assume that the variance of stochastic gradients is related to the "sub-optimality" of the approximate solutions delivered by the algorithm. Such problems naturally arise in a variety of applications, in particular, in the well-known generalized linear regression problem in statistics. However, to the best of our knowledge, none of the existing stochastic approximation algorithms for solving this class of problems attain optimality in terms of the dependence on accuracy, problem parameters, and mini-batch size. We discuss two non-Euclidean accelerated stochastic approximation routines--stochastic accelerated gradient descent (SAGD) and stochastic gradient extrapolation (SGE)--which carry a particular duality relationship. We show that both SAGD and SGE, under appropriate conditions, achieve the optimal convergence rate, attaining the optimal iteration and sample complexities simultaneously. However, corresponding assumptions for the SGE algorithm are more general; they allow, for instance, for efficient application of the SGE to statistical estimation problems under heavy tail noises and discontinuous score functions. We also discuss the application of the SGE to problems satisfying quadratic growth conditions, and show how it can be used to recover sparse solutions. Finally, we report on some simulation experiments to illustrate numerical performance of our proposed algorithms in high-dimensional settings.
    Cramer Type Distances for Learning Gaussian Mixture Models by Gradient Descent. (arXiv:2307.06753v1 [cs.LG])
    The learning of Gaussian Mixture Models (also referred to simply as GMMs) plays an important role in machine learning. Known for their expressiveness and interpretability, Gaussian mixture models have a wide range of applications, from statistics, computer vision to distributional reinforcement learning. However, as of today, few known algorithms can fit or learn these models, some of which include Expectation-Maximization algorithms and Sliced Wasserstein Distance. Even fewer algorithms are compatible with gradient descent, the common learning process for neural networks. In this paper, we derive a closed formula of two GMMs in the univariate, one-dimensional case, then propose a distance function called Sliced Cram\'er 2-distance for learning general multivariate GMMs. Our approach has several advantages over many previous methods. First, it has a closed-form expression for the univariate case and is easy to compute and implement using common machine learning libraries (e.g., PyTorch and TensorFlow). Second, it is compatible with gradient descent, which enables us to integrate GMMs with neural networks seamlessly. Third, it can fit a GMM not only to a set of data points, but also to another GMM directly, without sampling from the target model. And fourth, it has some theoretical guarantees like global gradient boundedness and unbiased sampling gradient. These features are especially useful for distributional reinforcement learning and Deep Q Networks, where the goal is to learn a distribution over future rewards. We will also construct a Gaussian Mixture Distributional Deep Q Network as a toy example to demonstrate its effectiveness. Compared with previous models, this model is parameter efficient in terms of representing a distribution and possesses better interpretability.
    On the Validity of Conformal Prediction for Network Data Under Non-Uniform Sampling. (arXiv:2306.07252v4 [math.ST] UPDATED)
    We study the properties of conformal prediction for network data under various sampling mechanisms that commonly arise in practice but often result in a non-representative sample of nodes. We interpret these sampling mechanisms as selection rules applied to a superpopulation and study the validity of conformal prediction conditional on an appropriate selection event. We show that the sampled subarray is exchangeable conditional on the selection event if the selection rule satisfies a permutation invariance property and a joint exchangeability condition holds for the superpopulation. Our result implies the finite-sample validity of conformal prediction for certain selection events related to ego networks and snowball sampling. We also show that when data are sampled via a random walk on a graph, a variant of weighted conformal prediction yields asymptotically valid prediction sets for an independently selected node from the population.
    Learning Graph ARMA Processes from Time-Vertex Spectra. (arXiv:2302.06887v2 [stat.ML] UPDATED)
    The modeling of time-varying graph signals as stationary time-vertex stochastic processes permits the inference of missing signal values by efficiently employing the correlation patterns of the process across different graph nodes and time instants. In this study, we propose an algorithm for computing graph autoregressive moving average (graph ARMA) processes based on learning the joint time-vertex power spectral density of the process from its incomplete realizations for the task of signal interpolation. Our solution relies on first roughly estimating the joint spectrum of the process from partially observed realizations and then refining this estimate by projecting it onto the spectrum manifold of the graph ARMA process through convex relaxations. The initially missing signal values are then estimated based on the learnt model. Experimental results show that the proposed approach achieves high accuracy in time-vertex signal estimation problems.
    An Improved Uniform Convergence Bound with Fat-Shattering Dimension. (arXiv:2307.06644v1 [cs.LG])
    The fat-shattering dimension characterizes the uniform convergence property of real-valued functions. The state-of-the-art upper bounds feature a multiplicative squared logarithmic factor on the sample complexity, leaving an open gap with the existing lower bound. We provide an improved uniform convergence bound that closes this gap.
    Spectral-Bias and Kernel-Task Alignment in Physically Informed Neural Networks. (arXiv:2307.06362v1 [stat.ML])
    Physically informed neural networks (PINNs) are a promising emerging method for solving differential equations. As in many other deep learning approaches, the choice of PINN design and training protocol requires careful craftsmanship. Here, we suggest a comprehensive theoretical framework that sheds light on this important problem. Leveraging an equivalence between infinitely over-parameterized neural networks and Gaussian process regression (GPR), we derive an integro-differential equation that governs PINN prediction in the large data-set limit -- the Neurally-Informed Equation (NIE). This equation augments the original one by a kernel term reflecting architecture choices and allows quantifying implicit bias induced by the network via a spectral decomposition of the source term in the original differential equation.
    Robust online active learning. (arXiv:2302.00422v5 [stat.ML] UPDATED)
    In many industrial applications, obtaining labeled observations is not straightforward as it often requires the intervention of human experts or the use of expensive testing equipment. In these circumstances, active learning can be highly beneficial in suggesting the most informative data points to be used when fitting a model. Reducing the number of observations needed for model development alleviates both the computational burden required for training and the operational expenses related to labeling. Online active learning, in particular, is useful in high-volume production processes where the decision about the acquisition of the label for a data point needs to be taken within an extremely short time frame. However, despite the recent efforts to develop online active learning strategies, the behavior of these methods in the presence of outliers has not been thoroughly examined. In this work, we investigate the performance of online active linear regression in contaminated data streams. Our study shows that the currently available query strategies are prone to sample outliers, whose inclusion in the training set eventually degrades the predictive performance of the models. To address this issue, we propose a solution that bounds the search area of a conditional D-optimal algorithm and uses a robust estimator. Our approach strikes a balance between exploring unseen regions of the input space and protecting against outliers. Through numerical simulations, we show that the proposed method is effective in improving the performance of online active learning in the presence of outliers, thus expanding the potential applications of this powerful tool.  ( 3 min )
    Multivariate Time Series characterization and forecasting of VoIP traffic in real mobile networks. (arXiv:2307.06645v1 [cs.NI])
    Predicting the behavior of real-time traffic (e.g., VoIP) in mobility scenarios could help the operators to better plan their network infrastructures and to optimize the allocation of resources. Accordingly, in this work the authors propose a forecasting analysis of crucial QoS/QoE descriptors (some of which neglected in the technical literature) of VoIP traffic in a real mobile environment. The problem is formulated in terms of a multivariate time series analysis. Such a formalization allows to discover and model the temporal relationships among various descriptors and to forecast their behaviors for future periods. Techniques such as Vector Autoregressive models and machine learning (deep-based and tree-based) approaches are employed and compared in terms of performance and time complexity, by reframing the multivariate time series problem into a supervised learning one. Moreover, a series of auxiliary analyses (stationarity, orthogonal impulse responses, etc.) are performed to discover the analytical structure of the time series and to provide deep insights about their relationships. The whole theoretical analysis has an experimental counterpart since a set of trials across a real-world LTE-Advanced environment has been performed to collect, post-process and analyze about 600,000 voice packets, organized per flow and differentiated per codec.  ( 2 min )
    Multiple Testing Framework for Out-of-Distribution Detection. (arXiv:2206.09522v4 [stat.ML] UPDATED)
    We study the problem of Out-of-Distribution (OOD) detection, that is, detecting whether a learning algorithm's output can be trusted at inference time. While a number of tests for OOD detection have been proposed in prior work, a formal framework for studying this problem is lacking. We propose a definition for the notion of OOD that includes both the input distribution and the learning algorithm, which provides insights for the construction of powerful tests for OOD detection. We propose a multiple hypothesis testing inspired procedure to systematically combine any number of different statistics from the learning algorithm using conformal p-values. We further provide strong guarantees on the probability of incorrectly classifying an in-distribution sample as OOD. In our experiments, we find that threshold-based tests proposed in prior work perform well in specific settings, but not uniformly well across different types of OOD instances. In contrast, our proposed method that combines multiple statistics performs uniformly well across different datasets and neural networks.  ( 2 min )
    Adversarial Policies Beat Superhuman Go AIs. (arXiv:2211.00241v4 [cs.LG] UPDATED)
    We attack the state-of-the-art Go-playing AI system KataGo by training adversarial policies against it, achieving a >97% win rate against KataGo running at superhuman settings. Our adversaries do not win by playing Go well. Instead, they trick KataGo into making serious blunders. Our attack transfers zero-shot to other superhuman Go-playing AIs, and is comprehensible to the extent that human experts can implement it without algorithmic assistance to consistently beat superhuman AIs. The core vulnerability uncovered by our attack persists even in KataGo agents adversarially trained to defend against our attack. Our results demonstrate that even superhuman AI systems may harbor surprising failure modes. Example games are available https://goattack.far.ai/.  ( 2 min )
    Deep Neural Networks for Semiparametric Frailty Models via H-likelihood. (arXiv:2307.06581v1 [stat.ML])
    For prediction of clustered time-to-event data, we propose a new deep neural network based gamma frailty model (DNN-FM). An advantage of the proposed model is that the joint maximization of the new h-likelihood provides maximum likelihood estimators for fixed parameters and best unbiased predictors for random frailties. Thus, the proposed DNN-FM is trained by using a negative profiled h-likelihood as a loss function, constructed by profiling out the non-parametric baseline hazard. Experimental studies show that the proposed method enhances the prediction performance of the existing methods. A real data analysis shows that the inclusion of subject-specific frailties helps to improve prediction of the DNN based Cox model (DNN-Cox).  ( 2 min )
    An Image-Denoising Framework Fit for Quantum Annealing via QUBO and Restricted Boltzmann Machines. (arXiv:2307.06542v1 [quant-ph])
    We investigate a framework for binary image denoising via restricted Boltzmann machines (RBMs) that introduces a denoising objective in quadratic unconstrained binary optimization (QUBO) form and is well-suited for quantum annealing. The denoising objective is attained by balancing the distribution learned by a trained RBM with a penalty term for derivations from the noisy image. We derive the statistically optimal choice of the penalty parameter assuming the target distribution has been well-approximated, and further suggest an empirically supported modification to make the method robust to that idealistic assumption. We also show under additional assumptions that the denoised images attained by our method are, in expectation, strictly closer to the noise-free images than the noisy images are. While we frame the model as an image denoising model, it can be applied to any binary data. As the QUBO formulation is well-suited for implementation on quantum annealers, we test the model on a D-Wave Advantage machine, and also test on data too large for current quantum annealers by approximating QUBO solutions through classical heuristics.  ( 2 min )
    A Deep Learning Method for Comparing Bayesian Hierarchical Models. (arXiv:2301.11873v3 [stat.ML] UPDATED)
    Bayesian model comparison (BMC) offers a principled approach for assessing the relative merits of competing computational models and propagating uncertainty into model selection decisions. However, BMC is often intractable for the popular class of hierarchical models due to their high-dimensional nested parameter structure. To address this intractability, we propose a deep learning method for performing BMC on any set of hierarchical models which can be instantiated as probabilistic programs. Since our method enables amortized inference, it allows efficient re-estimation of posterior model probabilities and fast performance validation prior to any real-data application. In a series of extensive validation studies, we benchmark the performance of our method against the state-of-the-art bridge sampling method and demonstrate excellent amortized inference across all BMC settings. We then showcase our method by comparing four hierarchical evidence accumulation models that have previously been deemed intractable for BMC due to partly implicit likelihoods. In this application, we corroborate evidence for the recently proposed L\'evy flight model of decision-making and show how transfer learning can be leveraged to enhance training efficiency. We provide reproducible code for all analyses and an open-source implementation of our method.  ( 2 min )
    Adapting to Mixing Time in Stochastic Optimization with Markovian Data. (arXiv:2202.04428v3 [cs.LG] UPDATED)
    We consider stochastic optimization problems where data is drawn from a Markov chain. Existing methods for this setting crucially rely on knowing the mixing time of the chain, which in real-world applications is usually unknown. We propose the first optimization method that does not require the knowledge of the mixing time, yet obtains the optimal asymptotic convergence rate when applied to convex problems. We further show that our approach can be extended to: (i) finding stationary points in non-convex optimization with Markovian data, and (ii) obtaining better dependence on the mixing time in temporal difference (TD) learning; in both cases, our method is completely oblivious to the mixing time. Our method relies on a novel combination of multi-level Monte Carlo (MLMC) gradient estimation together with an adaptive learning method.  ( 2 min )
    Learning low-rank latent mesoscale structures in networks. (arXiv:2102.06984v5 [cs.SI] UPDATED)
    It is common to use networks to encode the architecture of interactions between entities in complex systems in the physical, biological, social, and information sciences. To study the large-scale behavior of complex systems, it is useful to examine mesoscale structures in networks as building blocks that influence such behavior. We present a new approach for describing low-rank mesoscale structures in networks, and we illustrate our approach using several synthetic network models and empirical friendship, collaboration, and protein--protein interaction (PPI) networks. We find that these networks possess a relatively small number of `latent motifs' that together can successfully approximate most subgraphs of a network at a fixed mesoscale. We use an algorithm for `network dictionary learning' (NDL), which combines a network-sampling method and nonnegative matrix factorization, to learn the latent motifs of a given network. The ability to encode a network using a set of latent motifs has a wide variety of applications to network-analysis tasks, such as comparison, denoising, and edge inference. Additionally, using a new network denoising and reconstruction (NDR) algorithm, we demonstrate how to denoise a corrupted network by using only the latent motifs that one learns directly from the corrupted network.  ( 3 min )
    Deep Network Approximation: Beyond ReLU to Diverse Activation Functions. (arXiv:2307.06555v1 [cs.LG])
    This paper explores the expressive power of deep neural networks for a diverse range of activation functions. An activation function set $\mathscr{A}$ is defined to encompass the majority of commonly used activation functions, such as $\mathtt{ReLU}$, $\mathtt{LeakyReLU}$, $\mathtt{ReLU}^2$, $\mathtt{ELU}$, $\mathtt{SELU}$, $\mathtt{Softplus}$, $\mathtt{GELU}$, $\mathtt{SiLU}$, $\mathtt{Swish}$, $\mathtt{Mish}$, $\mathtt{Sigmoid}$, $\mathtt{Tanh}$, $\mathtt{Arctan}$, $\mathtt{Softsign}$, $\mathtt{dSiLU}$, and $\mathtt{SRS}$. We demonstrate that for any activation function $\varrho\in \mathscr{A}$, a $\mathtt{ReLU}$ network of width $N$ and depth $L$ can be approximated to arbitrary precision by a $\varrho$-activated network of width $6N$ and depth $2L$ on any bounded set. This finding enables the extension of most approximation results achieved with $\mathtt{ReLU}$ networks to a wide variety of other activation functions, at the cost of slightly larger constants.  ( 2 min )
    A kernel Stein test of goodness of fit for sequential models. (arXiv:2210.10741v3 [stat.ML] UPDATED)
    We propose a goodness-of-fit measure for probability densities modeling observations with varying dimensionality, such as text documents of differing lengths or variable-length sequences. The proposed measure is an instance of the kernel Stein discrepancy (KSD), which has been used to construct goodness-of-fit tests for unnormalized densities. The KSD is defined by its Stein operator: current operators used in testing apply to fixed-dimensional spaces. As our main contribution, we extend the KSD to the variable-dimension setting by identifying appropriate Stein operators, and propose a novel KSD goodness-of-fit test. As with the previous variants, the proposed KSD does not require the density to be normalized, allowing the evaluation of a large class of models. Our test is shown to perform well in practice on discrete sequential data benchmarks.  ( 2 min )
    Tensor Decompositions Meet Control Theory: Learning General Mixtures of Linear Dynamical Systems. (arXiv:2307.06538v1 [cs.LG])
    Recently Chen and Poor initiated the study of learning mixtures of linear dynamical systems. While linear dynamical systems already have wide-ranging applications in modeling time-series data, using mixture models can lead to a better fit or even a richer understanding of underlying subpopulations represented in the data. In this work we give a new approach to learning mixtures of linear dynamical systems that is based on tensor decompositions. As a result, our algorithm succeeds without strong separation conditions on the components, and can be used to compete with the Bayes optimal clustering of the trajectories. Moreover our algorithm works in the challenging partially-observed setting. Our starting point is the simple but powerful observation that the classic Ho-Kalman algorithm is a close relative of modern tensor decomposition methods for learning latent variable models. This gives us a playbook for how to extend it to work with more complicated generative models.  ( 2 min )
    A Novel Bayes' Theorem for Upper Probabilities. (arXiv:2307.06831v1 [stat.ML])
    In their seminal 1990 paper, Wasserman and Kadane establish an upper bound for the Bayes' posterior probability of a measurable set $A$, when the prior lies in a class of probability measures $\mathcal{P}$ and the likelihood is precise. They also give a sufficient condition for such upper bound to hold with equality. In this paper, we introduce a generalization of their result by additionally addressing uncertainty related to the likelihood. We give an upper bound for the posterior probability when both the prior and the likelihood belong to a set of probabilities. Furthermore, we give a sufficient condition for this upper bound to become an equality. This result is interesting on its own, and has the potential of being applied to various fields of engineering (e.g. model predictive control), machine learning, and artificial intelligence.  ( 2 min )
    The complexity of non-stationary reinforcement learning. (arXiv:2307.06877v1 [cs.LG])
    The problem of continual learning in the domain of reinforcement learning, often called non-stationary reinforcement learning, has been identified as an important challenge to the application of reinforcement learning. We prove a worst-case complexity result, which we believe captures this challenge: Modifying the probabilities or the reward of a single state-action pair in a reinforcement learning problem requires an amount of time almost as large as the number of states in order to keep the value function up to date, unless the strong exponential time hypothesis (SETH) is false; SETH is a widely accepted strengthening of the P $\neq$ NP conjecture. Recall that the number of states in current applications of reinforcement learning is typically astronomical. In contrast, we show that just $\textit{adding}$ a new state-action pair is considerably easier to implement.  ( 2 min )
    Weighted Averaged Stochastic Gradient Descent: Asymptotic Normality and Optimality. (arXiv:2307.06915v1 [stat.ML])
    Stochastic Gradient Descent (SGD) is one of the simplest and most popular algorithms in modern statistical and machine learning due to its computational and memory efficiency. Various averaging schemes have been proposed to accelerate the convergence of SGD in different settings. In this paper, we explore a general averaging scheme for SGD. Specifically, we establish the asymptotic normality of a broad range of weighted averaged SGD solutions and provide asymptotically valid online inference approaches. Furthermore, we propose an adaptive averaging scheme that exhibits both optimal statistical rate and favorable non-asymptotic convergence, drawing insights from the optimal weight for the linear model in terms of non-asymptotic mean squared error (MSE).  ( 2 min )
    Robust scalable initialization for Bayesian variational inference with multi-modal Laplace approximations. (arXiv:2307.06424v1 [stat.ME])
    For predictive modeling relying on Bayesian inversion, fully independent, or ``mean-field'', Gaussian distributions are often used as approximate probability density functions in variational inference since the number of variational parameters is twice the number of unknown model parameters. The resulting diagonal covariance structure coupled with unimodal behavior can be too restrictive when dealing with highly non-Gaussian behavior, including multimodality. High-fidelity surrogate posteriors in the form of Gaussian mixtures can capture any distribution to an arbitrary degree of accuracy while maintaining some analytical tractability. Variational inference with Gaussian mixtures with full-covariance structures suffers from a quadratic growth in variational parameters with the number of model parameters. Coupled with the existence of multiple local minima due to nonconvex trends in the loss functions often associated with variational inference, these challenges motivate the need for robust initialization procedures to improve the performance and scalability of variational inference with mixture models. In this work, we propose a method for constructing an initial Gaussian mixture model approximation that can be used to warm-start the iterative solvers for variational inference. The procedure begins with an optimization stage in model parameter space in which local gradient-based optimization, globalized through multistart, is used to determine a set of local maxima, which we take to approximate the mixture component centers. Around each mode, a local Gaussian approximation is constructed via the Laplace method. Finally, the mixture weights are determined through constrained least squares regression. Robustness and scalability are demonstrated using synthetic tests. The methodology is applied to an inversion problem in structural dynamics involving unknown viscous damping coefficients.  ( 3 min )
    Energy Discrepancies: A Score-Independent Loss for Energy-Based Models. (arXiv:2307.06431v1 [stat.ML])
    Energy-based models are a simple yet powerful class of probabilistic models, but their widespread adoption has been limited by the computational burden of training them. We propose a novel loss function called Energy Discrepancy (ED) which does not rely on the computation of scores or expensive Markov chain Monte Carlo. We show that ED approaches the explicit score matching and negative log-likelihood loss under different limits, effectively interpolating between both. Consequently, minimum ED estimation overcomes the problem of nearsightedness encountered in score-based estimation methods, while also enjoying theoretical guarantees. Through numerical experiments, we demonstrate that ED learns low-dimensional data distributions faster and more accurately than explicit score matching or contrastive divergence. For high-dimensional image data, we describe how the manifold hypothesis puts limitations on our approach and demonstrate the effectiveness of energy discrepancy by training the energy-based model as a prior of a variational decoder model.  ( 2 min )
    Tackling Combinatorial Distribution Shift: A Matrix Completion Perspective. (arXiv:2307.06457v1 [cs.LG])
    Obtaining rigorous statistical guarantees for generalization under distribution shift remains an open and active research area. We study a setting we call combinatorial distribution shift, where (a) under the test- and training-distributions, the labels $z$ are determined by pairs of features $(x,y)$, (b) the training distribution has coverage of certain marginal distributions over $x$ and $y$ separately, but (c) the test distribution involves examples from a product distribution over $(x,y)$ that is {not} covered by the training distribution. Focusing on the special case where the labels are given by bilinear embeddings into a Hilbert space $H$: $\mathbb{E}[z \mid x,y ]=\langle f_{\star}(x),g_{\star}(y)\rangle_{{H}}$, we aim to extrapolate to a test distribution domain that is $not$ covered in training, i.e., achieving bilinear combinatorial extrapolation. Our setting generalizes a special case of matrix completion from missing-not-at-random data, for which all existing results require the ground-truth matrices to be either exactly low-rank, or to exhibit very sharp spectral cutoffs. In this work, we develop a series of theoretical results that enable bilinear combinatorial extrapolation under gradual spectral decay as observed in typical high-dimensional data, including novel algorithms, generalization guarantees, and linear-algebraic results. A key tool is a novel perturbation bound for the rank-$k$ singular value decomposition approximations between two matrices that depends on the relative spectral gap rather than the absolute spectral gap, a result that may be of broader independent interest.  ( 2 min )
    On Collaboration in Distributed Parameter Estimation with Resource Constraints. (arXiv:2307.06442v1 [cs.LG])
    We study sensor/agent data collection and collaboration policies for parameter estimation, accounting for resource constraints and correlation between observations collected by distinct sensors/agents. Specifically, we consider a group of sensors/agents each samples from different variables of a multivariate Gaussian distribution and has different estimation objectives, and we formulate a sensor/agent's data collection and collaboration policy design problem as a Fisher information maximization (or Cramer-Rao bound minimization) problem. When the knowledge of correlation between variables is available, we analytically identify two particular scenarios: (1) where the knowledge of the correlation between samples cannot be leveraged for collaborative estimation purposes and (2) where the optimal data collection policy involves investing scarce resources to collaboratively sample and transfer information that is not of immediate interest and whose statistics are already known, with the sole goal of increasing the confidence on the estimate of the parameter of interest. When the knowledge of certain correlation is unavailable but collaboration may still be worthwhile, we propose novel ways to apply multi-armed bandit algorithms to learn the optimal data collection and collaboration policy in our distributed parameter estimation problem and demonstrate that the proposed algorithms, DOUBLE-F, DOUBLE-Z, UCB-F, UCB-Z, are effective through simulations.  ( 2 min )
    Testing Sparsity Assumptions in Bayesian Networks. (arXiv:2307.06406v1 [stat.ML])
    Bayesian network (BN) structure discovery algorithms typically either make assumptions about the sparsity of the true underlying network, or are limited by computational constraints to networks with a small number of variables. While these sparsity assumptions can take various forms, frequently the assumptions focus on an upper bound for the maximum in-degree of the underlying graph $\nabla_G$. Theorem 2 in Duttweiler et. al. (2023) demonstrates that the largest eigenvalue of the normalized inverse covariance matrix ($\Omega$) of a linear BN is a lower bound for $\nabla_G$. Building on this result, this paper provides the asymptotic properties of, and a debiasing procedure for, the sample eigenvalues of $\Omega$, leading to a hypothesis test that may be used to determine if the BN has max in-degree greater than 1. A linear BN structure discovery workflow is suggested in which the investigator uses this hypothesis test to aid in selecting an appropriate structure discovery algorithm. The hypothesis test performance is evaluated through simulations and the workflow is demonstrated on data from a human psoriasis study.  ( 2 min )

  • Open

    NPC Steven acknowledged me finally!! 🤯 ChatGPT driven agents in Unreal Engine - update 3
    submitted by /u/Chance_Confection_37 [link] [comments]  ( 8 min )
    What comes first, AI gf or AI bf?
    Was recently hearing an interview with George Hotz and the question came up, what will be invented first: an AI boyfriend or an AI girlfriend? Obviously he had some opinions, would be curious to hear what others have to say. submitted by /u/geepytee [link] [comments]  ( 8 min )
    Training an AI Copywriter
    I want to train an ai bot to be my copywriting sidekick, so it would help me write stuff in the voice, tone and format that my predecessor used. For this, I would need to feed it our entire webpage, some voice&tone norms, presentations and so on. Could you guys pls help me on how to set this up? I have an OpenAI API key, and they did make gpt4 available to use recently soo.. This should be doable right? Thanks boys submitted by /u/Jacobo_csgo [link] [comments]  ( 8 min )
    “Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors
    submitted by /u/IngloriousBastion [link] [comments]  ( 8 min )
    Apple is like the quiet guy in the corner watching a bar fight with the big 3; you know he is gonna do something REALLY bad ass, but what?
    Google, OpenAI, and Meta (Facebook or whatever) have been having a free for all, trying to topple each with GPT5, Llama, and Bard. However, Apple has been REALLY quiet on the issue with A.I. , and focused on the release of this overpriced monstrosity, the Apple Vision Pro. It's a really big gamble with the $3500 price tag, but so was the iPhone many moons ago, and now everyone will dump their credit and their bank account for the latest iPhone Version Doodaad. I believe Apple is sidestepping this brawl to focus on perfecting augmented reality. This will ultimately unite the hardware with AR applications, and eventually lead to AI applications being embedded into the Vision Pro bundle. In short, while the Big 3 are playing, "King of the AI Mountain', by pummeling each other to the ground…  ( 9 min )
    Where and how can I best train an Ai to really learn my art style?
    A lot of the ai tools I use just doesn’t quite get what I’m asking it to do, and I’m not sure if I’m even able to train my own considering it’s a borg of everyone else’s work etc. I guess what I’m asking is if there is any tool out there that’s stand alone and not as watered down as a lot of the ones that’s available (most of the time for free)? I have a bunch of images I can feed it, along with examples. If anyone has any tips, suggestions, or work-arounds, please let me know! Much appreciated. submitted by /u/Maelasae [link] [comments]  ( 8 min )
    What is xAI and Why Did Elon Musk Launch It? 2023
    submitted by /u/__boiyah [link] [comments]  ( 8 min )
    AI Art Creator for Animations
    Are there any free art creators yet for doing simple animations? or does anyone have a link to how one would try to do this. TYIA. submitted by /u/Walfy07 [link] [comments]  ( 8 min )
    Is there a way to automatically enter free competitions?
    Is this possible and if so how would I go about this? Thanks for any hell submitted by /u/captainofthememeteam [link] [comments]  ( 8 min )
    China moves to support generative AI, regulate applications
    China's internet watchdog and several other authorities, including the National Development and Reform Commission and the Ministry of Science and Technology, have jointly issued an interim regulation on the management of generative artificial intelligence (AI) services. The regulation, published on the website of the Cyberspace Administration of China (CAC) on Thursday, will go into effect on Aug. 15. submitted by /u/Tiger_Claw_1 [link] [comments]  ( 8 min )
    Looking to get into AI Research
    Hi i am a highschool student in the last year, i am looking to get into ai research with potentially starting off my own research in fututre What kind of further studies should i get into in college for scope of getting a phd It would be really helpful to know what degree's should i pursue in order to start my own research at a point in time (Pardon my poor english.) submitted by /u/ShreeyanxRaina [link] [comments]  ( 8 min )
    AI Won’t Really Kill Us All, Will It? - The Atlantic (transcript and podcast)
    submitted by /u/RADICCHI0 [link] [comments]  ( 8 min )
    How do people actually make money using AI?
    I’ve been seeing a lot of posts regarding people making money off chat, GPT and other software’s. Is it even industry worth getting in to? submitted by /u/Hititfromtheback6969 [link] [comments]  ( 8 min )
    One-Minute Daily AI News 7/12/2023
    Anthropic, the AI startup co-founded by ex-OpenAI execs, today announced the release of a new text-generating AI model, Claude 2. The successor to Anthropic’s first commercial model, Claude 2 is available in beta starting today in the U.S. and U.K. both on the web and via a paid API.[1] Elon Musk has launched an AI company to challenge ChatGPT creator OpenAI, which the billionaire tech mogul has accused of being “woke”. On Wednesday, xAI said the goal of the new company would be to “understand the true nature of the universe”.[2] Chip designer Nvidia will invest $50 million to speed up training of Recursion’s artificial intelligence models for drug discovery, the companies said on Wednesday, sending the biotech firm’s shares surging about 83%.[3] For decades, morning weather reports have relied on the same kinds of conventional models. Now, weather forecasting is poised to join the ranks of industries revolutionized by artificial intelligence.A pair of papers, published Wednesday in the scientific journal Nature, touts the potential of two new AI forecasting approaches — systems that could yield faster and more accurate results than traditional models, researchers say.[4] Sources: [1] https://techcrunch.com/2023/07/11/anthropic-releases-claude-2-the-second-generation-of-its-ai-chatbot/ [2] https://www.aljazeera.com/economy/2023/7/13/musk-launches-artificial-intelligence-rival-to-chatgpts-openai [3] https://www.reuters.com/technology/nvidia-invests-50-mln-recursion-train-ai-models-drug-discovery-2023-07-12/ [4] https://www.scientificamerican.com/article/climate-change-could-stump-ai-weather-prediction/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    Best way to start learning?
    Where is the best place/way to start learning openai/ai? are there good tutorials? learning about how to train a model? Would something else be better to start with? submitted by /u/jeffsmith202 [link] [comments]  ( 8 min )
    Project StyleScribble - generate text with your writing voice
    Wanted to share my new AI project: StyleScribble is an AI-powered web tool designed to assist content creators in generating text using their own unique writing voices. By leveraging the power of artificial intelligence, this tool revolutionizes the content creation process and empowers users to effortlessly produce high-quality written content. ​ https://preview.redd.it/4670phrcjmbb1.png?width=448&format=png&auto=webp&s=bb93bf358bf917e49668262a7e0a2ee7a2aa5ba2 Link to demo: https://huggingface.co/spaces/daniellefranca96/styles-scribble-demo Link to sign in on waitlist: https://stylescribble.fly.dev submitted by /u/katerinaptrv12 [link] [comments]  ( 8 min )
    Best FREE Chrome Extension for reusing prompts other than PromptDrive.ai?
    This is the best I can find so far but before I start investing a lot of time into uploading prompts to this tool, I wanted to make sure this was the best on the market. AIPRM is great for pre-done prompts but limits private prompts. Appreciate any help! submitted by /u/Life-Hacking [link] [comments]  ( 8 min )
    Best tool/system for keeping up and organizing AI tools... Notion, Clickup or...?
    I'm sure this comes down to preference but before I start spending a ton of time building this out, I wanted some feedback on what others have found that worked best? Clickup Example: https://doc.clickup.com/37456139/d/h/13q28b-364/176f834177eb5cb Notion Example: https://enchanting-trader-463.notion.site/AI-Database-f917ca2e609b45478fe7bc2c8d544877 Long time Evernote & G docs user but neither of these is cutting it. submitted by /u/Life-Hacking [link] [comments]  ( 8 min )
  • Open

    [P] Colab - Generate JSON Dataset and Evaluate LLMs
    Here's a free colab notebook that I've been playing around with to generate JSON datasets from PDFs for fine-tuning LLMs and evaluate outputs/prompts for Toxicity, Bias, Quality etc. Colab: https://colab.research.google.com/drive/1KCn1HIeD3fQy8ecT74yHa3xgJZvdNvqL?usp=sharing GitHub Repo: https://github.com/kw2828/guardrail-ml submitted by /u/Educational_Grass_38 [link] [comments]  ( 8 min )
    [D] Looking to enroll in a MS in AI/ML
    I’m looking into getting my MS in ML soon and I’m not sure if the market is good, some friends and family are telling me its too saturated and you won’t find a job and others are telling me you still need leetcode to get a job and others tell me the job market is extremely competitive and you likely wont get a job. Im worried that im making a mistake. Here are some facts about me: 1. I currently am working in a company where I have 5 rotations every 9 months, 3 of them which will be AI/ML related, which means I’ll definitely have at least 3 projects in hand on a global scale (company is global) By the time im done with the MS, I’ll probably already have 2-3 years of hands on experience. Thats about it. Should I be worried? Please advise me, Thank you submitted by /u/KManYuksi [link] [comments]  ( 9 min )
    [D] ICCV Reviews are NOT out
    zzz submitted by /u/Towzeur [link] [comments]  ( 8 min )
    [D] Ray vs. AWS Batch for Distributed Training
    Hello all, In our organization, we are currently using Metaflow as our managed training infrastructure and leveraging the `@batch` decorator for compute. Using Batch, we also have access to multi-node parallel jobs (`@parallel` decorator) for distributed training and we've used it to great effect for fine-tuning some LLMs. We are now thinking of adopting Ray Train since it seems to be very popular nowadays and is gaining lots of traction. Wondering how Ray Train compares to Metaflow (AWS Batch) and what the pros/cons are for both, particularly in the context of scalable training of models. Please kindly share any insights. Thanks in advance! submitted by /u/rirhun [link] [comments]  ( 8 min )
    Is the following an valid way to combine models? [D]
    I am a physician conducting research on intensive care data from many patients. I have full ethical approval, this is just exploratory, nobody will be treated based on my results, and I have no statistician. I have a question in principal, rather than looking for a specific solution. I have trained and tuned three models on my (very imperfect) data. They make binary predictions about the likely success of a treatment. I have developed a logistic regression model, a gaussian naive bayes model, and a c5.0 decision tree model. Each is imperfect in its own way. AUC for each is 0.72-0.79. I am wondering if I could ask each model to make predictions on a set of test data, and let the models 'vote'. For example, if 2 of 3 models agree, then I create a 'majority opinion' column on my data and go with that result. I figure it might weed out some weakness. Is this a 'valid' way to do this, or is it pure nonsense in the world of machine learning? submitted by /u/e05bf027 [link] [comments]  ( 9 min )
    [D] Text analysis alternative to LIWC?
    I'm working on a social science project and have a lot of text data that I would like to analyse between certain groups. Most of the papers that I've read on this use LIWC, but I unfortunately don't have access to that. An alternative I've found was the python package empath but it doesn't account for the use of pronouns which I know is going to be an important feature here. Does anyone know of a better alternative? Thanks a lot! submitted by /u/PlainJane049 [link] [comments]  ( 8 min )
    [R] Nvidia RTX 4090 ML benchmarks. Under QEMU/KVM. Image + Transformers. FP16/FP32.
    Motivation: I am running a proxmox instance for tinkering with devops stuff, k8s, ci and so on. And wanted to also have the ability to run ML workloads specifically any kind of ClosedAI open-sourced alternatives. Like Guanaco, WizardLM, Starcoder, Codegen. As well as having some kind of pre-prod environment for ML deployments. Environment: All the tests were run within a virtual machine(qemu) which is run using proxmox 7.4-3. CPU: Intel Xeon 2696v4 2.2Ghz Storage: Samsung SSD 870 EVO OS:Linux gpu-node 5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:16:32 UTC 2023 x86\_64 x86\_64 x86\_64 GNU/LinuxDISTRIB\_DESCRIPTION="Ubuntu 22.04.2 LTS" Python 3.10.12 (main, Jul 5 2023, 18:54:27) \[GCC 11.2.0\] on linux Torch version: Version: 2.1.0.dev20230709+cu121 import torch;torch.version.cuda -> …  ( 9 min )
    [D] Stats package (JMP) vs machine learning for prediction
    CIO wants to predict "things" about mortgages using "technology". (There have been a few specifics and possible factors suggested, like if a loan was returned to us by the bank we sold it to). IT manager is super gung ho about using machine learning and AI. I'm not an experienced statistician, but I used JMP when I worked as a process engineer (yeah, squiggly career), and was like, why can't we just do this with JMP? I can figure out which factors are significant and which are most strongly correlated to what they want answers to. Is there a valid reason to go the machine learning route, or is it just a hot topic right now? No one seems willing to point out any flaws with that method and that makes me uncomfortable. submitted by /u/clarielz [link] [comments]  ( 9 min )
    TokenMonster Ungreedy Subword Tokenizer V4: Enables Models to be 4x Smaller and Whilst Achieving Higher Chr/Token (With Evidence) [P]
    GitHub | Interactive Benchmark | Live Tokenizer TokenMonster is an ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript. You can use one of my pretrained vocabularies or generate your own with the included tools. TokenMonster can tokenize text more efficiently than other tokenization methods, even when using a much smaller vocabulary. Here is a size 24000 TokenMonster vocabulary benchmarked against tiktoken cl100k_base (100256) and LLaMa (32000) (link to interactive benchmark): https://preview.redd.it/o16a9tbrurbb1.png?width=1506&format=png&auto=webp&s=66c11d2b8defd634c86756064125b70e8e5cb6d6 Unlike previously versions of TokenMonster, the current version does not compress the text into as few tokens as possible to achieve the high chr/token. TokenMonster V4 of…  ( 9 min )
    [P] INT-FP-QSim: Mixed Precision and Formats For Large Language Models and Vision Transformers
    Hello, We've released INT-FP-QSim: https://github.com/lightmatter-ai/INT-FP-QSim, a flexible simulator that allows running different LLMs and vision transformers at different formats (INT, FP) and precision (8-bit, 4-bit). The repository also has scripts for running simple evaluation with Stable Diffusion, Maskformer, Graphormer, ImageBind and CodeGen (with and without the simulator). Since there are users who may not have a good starting point for running different models, we hope that the example scripts provided in this repository will help there as well. submitted by /u/IllustriousSir_007 [link] [comments]  ( 8 min )
    [P] Federated Learning framework
    Hello everyone 👋 We have launched a new project called MetisFL, a federated learning framework that allows developers to federate their machine learning workflows and train their models across distributed datasets without having to collect the data in a centralized location. https://github.com/NevronAI/metisfl Every feedback or even a simple star, would be highly appreciated! submitted by /u/No-Literature-1930 [link] [comments]  ( 8 min )
    [P] Curated Transformers: Library of PyTorch LLM and other transformers, with common components
    https://github.com/explosion/curated-transformers/ Curated Transformers is a new library of PyTorch LLM and other transformer architectures, implemented using common components. The library makes a different trade-off from Huggingface Transformers, which uses wholly separate implementations for each architecture. The advantage of HF's approach is that they can adopt new architectures very quickly, by taking the academic implementation more or less verbatim. Curated Transformers won't be able to add architectures as quickly, but the implementations share common elements, making it easier to mix and match components and see how architectures differ. We'll also be able to share bug fixes and improvements between different models. Check it out if you're interested in mixing and matching parts of different models, or if you just want a lighter-weight, pure-PyTorch experience, without any intervening abstractions. Supported encoder-only models: ALBERT BERT CamemBERT RoBERTa XLM-RoBERTa Supported decoder-only models: GPT-NeoX LLaMA Falcon Generator wrappers: Dolly v2 Falcon All models can be loaded from Huggingface Hub. spaCy integration for curated transformers is provided by the spacy-curated-transformers package. submitted by /u/syllogism_ [link] [comments]  ( 9 min )
    [R] PyTorch DeepLab/MMSegmentation Tutorial Resources
    I've hit a bit of a roadblock. I'm struggling to find tutorials with PyTorch code for Semantic Segmentation. I initially used the MMSegmentation tutorial on its GitHub, but that didn't work as there were a number of missing files. Any help would be massively appreciated as I'm really struggling. submitted by /u/Charako [link] [comments]  ( 8 min )
    [Research] incorporating additional information into the latent space of CNN's?
    Context: The input is a 3D MRI image, aswell as a position and orientation both in 3D. Output is a 3D image Most likely candidate for architecture is 3D-Unet In general we are trying to predict how a brain will be stimulated based a magnet producing an electric field. The position and orientation I am talking about is the position and orientation of the magnet. If you are interested this paper works on the same problem: doi.org/10.1371/journal.pone.0254588 Question: How do we incoorperate multiple different input types (image + vectors) in CNN's We know the position and orientation information is extreemly relevant to the output, meaning we know it can be incoorperated directly into the latentspace, what I dont understand is how to incoorperate the two vectors into the latentspace in order to keep the feature dimentions. I can think of two options: 1) add additional feature layers to the latentspace in which every layer has one value as constant accross the entire layer, that way if we have 2 vectors and a angle we would add 7 feature layers to the latent space. Here I am not sure if the neural network can work with these since they only really have any meaning if you put them together into a vector 2) completely flatten the latent space add the values and then reshape them back into form. Here i doubt this is a good idea since we are destroying all localization information from the features Previous approach: One solution we have thought about is feeding the neural net the mri image as if it was taken from the position, orientation and angle that way incoorperating the information indirectly. This solution preduces the most accurate results in previous papers. Unfortunatly that solution leads to a too long preprocessing time and since we need the inference to be real time it is not a valid solution. submitted by /u/ClumsyClassifier [link] [comments]  ( 9 min )
    [P] Journal Hub - literature discussion platform project
    [ Removed by Reddit on account of violating the content policy. ] submitted by /u/iokarkan [link] [comments]  ( 8 min )
    [R] Deep Learning Models for Forest Canopy Estimation
    I need to either build a tool or use a service which can convert satellite image (Input) of trees, and then output an estimate of its forest canopy. Has anyone ever used or know of such deep learning models? Thanks. submitted by /u/Quantumercifier [link] [comments]  ( 8 min )
    [D] Overview of recent developments in audio-generative models (TTS & TTM)
    Hi everyone, I wrote a blog post on the advancements in Generative AI for audio, focusing on Text-to-Speech (TTS) and Text-to-Music (TTM) models. I survey how techniques from LLMs have been adapted for audio generation, leading to significant improvements. The article takes a technical look at some of the recent models in this space, including MusicLM, VALL-E, and also explains the key ideas behind Neural Audio Codecs and Residual Vector Quantization techniques, which have become essential in nearly all text-to-audio models. If you have an interest in the current state and future of Generative AI for audio, I hope you will find this informative! I drop the article link in the comments below 👇👇👇 I appreciate any feedback or thoughts you may have! submitted by /u/mrx-ai [link] [comments]  ( 9 min )
    [P] Fast debugging of audio machine learning models
    I recently have improved a library I created to have support for finding issues in audio data using audio embeddings. ​ A mix of automatically detecting problematic data clusters and reviewing them visually can help speed up model debugging. The principle behind this is as follows: Step 1: Compute audio embeddings for the raw data 🎶This makes the data explorable by audio similarity. Depending on the properties the model captures, you will get different notions of similarity. E.g., if you use a model for speaker identification, you will probably order your data according to the speaker's voice properties. Step 2: Identify problematic data slices using clustering 🔍One strategy to get explicit suggestions for problematic data slices is clustering the samples based on audio embeddings. You can then compute your evaluation metrics for the identified clusters and search for clusters that, compared to the overall accuracy, show a significant accuracy drop. Step 3: Review the supposed issues visually 👀Especially when using only unstructured data, the results of your analysis will not be readily interpretable. However, reviewing the problematic clusters by listening and visualizing (e.g., drawing spectrograms) will help you filter out actual model and data issues. I also created a Medium Post and an Example Notebook for this. Also check out this Interactive Result Visualization on Huggingface. ​ ​ ​ submitted by /u/OkResearch6289 [link] [comments]  ( 9 min )
    [D] Overfit ML model
    Hello this is my first contribution to the sub, I have a dataset for a classification problem and it seems that all the models I have performed are overfiting (f1 score close to 1). I cant wrap my head around how to solve it without removing important values. submitted by /u/Archyve [link] [comments]  ( 8 min )
    [D] how to accelerate ViT models more faster
    I have conducted experiments and examples on accelerating ViT (Vision Transformer) using methods such as TensorRT, FasterTransformer, and xFormers. The experiments were conducted using a single A100 as a baseline. - https://github.com/bnabis93/vision-language-examples/tree/main/acceleration In xFormers, I tried applying sparse attention and memory-efficient attention to ViT, but there was an issue where the speed actually decreased. Therefore, I excluded those results. Generally, just performing TensorRT conversion significantly improves latency. In the case of faster transformer, optimized kernels are not provided in fp32, so it is not as effective as expected. However, after quantizing to fp16 and obtaining the results, it was more effective than simply performing TensorRT conversion. Are there any other methods for accelerating ViT? Using OpenAI's Triton is one option that comes to mind. If you have any other methods worth trying, I would appreciate it if you could let me know. submitted by /u/bono-93 [link] [comments]  ( 9 min )
    [P] LLMs, NLP: Building a Model to Generate and Analyze Claim Narratives
    Hi all. I'm diving into the world of Natural Language Processing (NLP) and Large Language Models (LLMs), and I could really use some assistance with my project. Here's what I'm aiming to achieve: Imagine having a dataset with various features such as location, type of claim, and claim payment. Additionally, there's a column dedicated to narratives, which provide descriptions of each claim. For instance, a narrative could be "I was crossing the street and was hit by a car, which broke my leg." My goals are twofold: i) Develop a model capable of taking the three columns (type of claim, claim payment, and location) as input and generating a narrative. ii) Create another model that does the reverse: input a narrative and extract relevant information like the type of claim and the location. I would greatly appreciate any advice, or resources you can provide. Thank you in advance! submitted by /u/therobot20 [link] [comments]  ( 9 min )
    [D]Encoder only vs encoder-decoder vs decoder only
    I understand that encoder only models (like BERT etc) are mainly for learning representations of words taking context on both sides. What I’m confused about is why you would need decoder only vs encoder-decoder models. GPT and Bloom are decoder only while I think T5 is enc-dec, I’m not sure why you would use one vs the other. Intuitively enc-dec model has more parameters and should be better at tasks where you have both complex text input and output, like say translation or summarization. Any ideas why decoder only models are desirable and also why they seem to work so well? Thanks in advance. submitted by /u/Western-Image7125 [link] [comments]  ( 8 min )
  • Open

    "Reinforcement Learning in Newcomblike Environments", Bell et al 2021
    submitted by /u/gwern [link] [comments]  ( 8 min )
    REINFORCE with Baseline not Learning
    I have implemented REINFORCE using PyTorch and am testing it on the CartPole environment. My implementation allows for an optional baseline to be applied. At present, the baseline used is simply the mean of the returns earned during a trajectory. The agent will learn a good policy when I DO NOT use a baseline, but when I apply the baseline, the agent fails to learn anything. I cannot figure out why. I notice that the loss is always very close to zero when using the baseline, but it seems like that should be expected. When the network weights are still random, most of the actions will have a probability that is near 0.5, and thus a log probability that is close to log(0.5) ~=~ -0.7. The returns for this environment are symmetric about the mean, so the weighted sum of the centered return…  ( 9 min )
    Beating DeepMind’s Game: Alchemy
    submitted by /u/Ok_Introduction9109 [link] [comments]  ( 8 min )
    Is offline-to-online RL some kind of Transfer-RL?
    I read some papers about offline-to-online (O2O) RL and transfer-RL. And I was trying to explore the O2O-transfer RL. Where we have data for one environment and we could pre-train a model offline then improve it online in another environment. If the MDP structure is the same for the target and source environments while transferring. What is the exact difference between O2O-RL and transfer-RL under this assumption? Essentially they are both trying to adapt the distribution drift, isn’t it? submitted by /u/Blasphemer666 [link] [comments]  ( 8 min )
    PPO agent completing Street Fighter III on our RL Platform, it consistently outperformed when using deterministic actions instead of sampling them proportionally to their probability. Why in your opinion? (see comment for details)
    submitted by /u/DIAMBRA_AIArena [link] [comments]  ( 8 min )
  • Open

    Symbol tuning improves in-context learning in language models
    Posted by Jerry Wei, Student Researcher, and Denny Zhou, Principal Scientist, Google Research A key feature of human intelligence is that humans can learn to perform new tasks by reasoning using only a few examples. Scaling up language models has unlocked a range of new applications and paradigms in machine learning, including the ability to perform challenging reasoning tasks via in-context learning. Language models, however, are still sensitive to the way that prompts are given, indicating that they are not reasoning in a robust manner. For instance, language models often require heavy prompt engineering or phrasing tasks as instructions, and they exhibit unexpected behaviors such as performance on tasks being unaffected even when shown incorrect labels. In “Symbol tuning improves…  ( 93 min )
  • Open

    Data modeling techniques in modern data warehouse
    Hello, data enthusiast! In this article let’s discuss “Data Modelling” right from the traditional and classical ways and aligning to today’s digital way, especially for analytics and advanced analytics. Yes! Of course, last 40+ years we all worked for OLTP, and followed by we started focusing on OLAP. After cloud ear come into the picture… Read More »Data modeling techniques in modern data warehouse The post Data modeling techniques in modern data warehouse appeared first on Data Science Central.  ( 26 min )
    A Detailed Guide for Data Handling Techniques in Data Science
    Image Source: Author Introduction Data Engineers and Data Scientists need data for their Day-to-Day job. Of course, It could be for Data Analytics, Data Prediction, Data Mining, Building Machine Learning Models Etc., All these are taken care of by the respective team members and they need to work towards identifying relevant data sources, and associated with… Read More »A Detailed Guide for Data Handling Techniques in Data Science The post A Detailed Guide for Data Handling Techniques in Data Science appeared first on Data Science Central.  ( 26 min )
  • Open

    training convolutional neural network with MATLAB: image recognition AI.
    submitted by /u/Character_Ad_1385 [link] [comments]  ( 8 min )
  • Open

    Effectively solve distributed training convergence issues with Amazon SageMaker Hyperband Automatic Model Tuning
    Recent years have shown amazing growth in deep learning neural networks (DNNs). This growth can be seen in more accurate models and even opening new possibilities with generative AI: large language models (LLMs) that synthesize natural language, text-to-image generators, and more. These increased capabilities of DNNs come with the cost of having massive models that […]  ( 11 min )
  • Open

    AI-Fueled Productivity: Generative AI Opens New Era of Efficiency Across Industries
    A watershed moment on Nov. 22, 2022, was mostly virtual, yet it shook the foundations of nearly every industry on the planet. On that day, OpenAI released ChatGPT, the most advanced artificial intelligence chatbot ever developed. This set off demand for generative AI applications that help businesses become more efficient, from providing consumers with answers Read article >  ( 11 min )
    Full-Scale Gaming: ‘Dragon’s Dogma: Dark Arisen’ Comes to GeForce NOW
    Arise, members! Capcom’s legendary role-playing game Dragon’s Dogma: Dark Arisen joins the GeForce NOW library today. The RPG and THQ Nordic’s Jagged Alliance 3 are newly supported on GeForce NOW, playable on nearly any device. From Dusk Till Pawn Become the Arisen and take up the challenge in Capcom’s critically acclaimed RPG. Set in a Read article >  ( 5 min )
  • Open

    Making sense of all things data
    Abel Sanchez helps industries and executives shift their operations in order to make sense of their data and use it to help their bottom lines.  ( 8 min )
    How an “AI-tocracy” emerges
    In China, the use of AI-driven facial recognition helps the regime repress dissent while enhancing the technology, researchers report.  ( 9 min )

  • Open

    [D] Does the university/program for masters for ML/AI roles matter?
    I am curious about this. I have an undergrad in CS from a large public university from the states and 2 yoe as SWE from India. Not now, but few more years down the line when I have like 4-5 yoe as SWE, I would want to get into ML/AI roles. I don't want to waste any more time getting into a full-time program and waste 1-2 more years of my 20s (I am 25 already) so I was looking for online programs from a >= decent university in North America. The thing about me is that I learn best by self-studying. For me personally, university prestige doesn't do anything for me cos they all read from ppts and use same books. I took a summer at Cornell and ended up studying on my own. So, for me to spend so much money or take out another 1-2 years for a full-time masters doesn't make sense. I just want get a masters that is cheap in terms of cost, takes comparatively less time, PT so that I can work at the same time and do powerlifting and other hobbies and after completion can make me eligible for non-research ML/AI roles. In all, I am only doing this to show on my resume to HR that "here is a guy who you won't need to spend too much time on analyzing cos he has done a masters/ some more additional math/stat courses and an institution has certified because they got paid to do so". submitted by /u/ItsWasntMyFault [link] [comments]  ( 9 min )
    [D] AI Text to Image Generation Project
    I am a student of computer science. I have my final year project whose deadline is in a few week. I have to create AI text to image generator trained on a custom data set. The user should provide it a prompt and it should generate an image based on the prompt given to it. Infact it will be limited as it will be trained on a limited cutom data set. I have searched throughout the internet but I am unable to find any helpful material, code or any resource... And those that I found I am unable to understand them because of my less expertise. Is there anyone who could help me ? submitted by /u/mufeezahmad [link] [comments]  ( 8 min )
    [D] How to do Semi Supervised Learning or Learning with no labels?
    Would appreciate some direction and pointers in this research area. Some possible options could be to use representational learning (probably using auto encoder, or InfoNCE/Contrastive loss). Is there any other good way to solve the problem of semi supervised learning where we have partial labels for the data and want our model to do well on unlabelled instances as well? submitted by /u/Rohit901 [link] [comments]  ( 8 min )
    [P] Free and Fast LLM Finetuning
    Here's a colab notebook showing a new interface for LLM finetuning that I've been playing around with. Curious if folks here have feedback. Colab: https://colab.research.google.com/drive/1QMeGzR9FnhNJJFmcHtm9RhFP3vrwIkFn Docs: https://www.lamini.ai/blog/free-fast-and-furious-finetuning Github Repo: https://github.com/lamini-ai/lamini LLM fine tuning includes several advanced optimizations: Chinchilla recipe smaller models pretrained on data increases inference speed Instruction fine tuning training on a small high quality set of instructions unlocks the knowledge learned during foundation model training. Latency constrained batching achieves high utilization under load during token generation Containerized SLURM combines fast scheduling of SLURM with LLM containers Mixed precision training uses matrix operations for training There are so many low hanging fruits in LLM tuning, steering, and alignment. We are just getting started on this for enterprise and open source. For this reason I disagree with Sam Altman that the age of bigger models is over. We are still leaving orders of magnitude on the table, e.g. by not including optimizations like sparsity in these models. References for inspiration: [1] - https://arxiv.org/abs/2203.15556 [2] - https://arxiv.org/abs/1910.10683 [3] - https://www.usenix.org/system/files/osdi22-yu.pdf [4] - https://www.schedmd.com/ [5] - https://arxiv.org/abs/1710.03740 submitted by /u/gdiamos [link] [comments]  ( 9 min )
    Wind speed prediction -appreciate opinion [P]
    Thank you for reading this post. As a fancy kitesurfing I’m trying to predict the wind speed +12h. However, so far my succes is limited. How can I improve my prediction? Data So far, I’m using windspeed&direction, temperature and humidity 10 min interval for past 2 year for 6 locations. Features Day of the year and hour of the day. In addition I have aggregated the 10 min data to 1 hour and calculated the mean, min, max and std. All the data then was made into 18 lags. Models Linear regression, kneigbor an random Forrest. However, only random forest was slightly better then the liniear regression. What type of features and models would you recommend? Cheers submitted by /u/Any-Description3824 [link] [comments]  ( 8 min )
    [D] NeurIPS 2023 reviews release time?
    Hey guys, I was wondering if y’all know when NeurIPS reviews for this year would be released? I know the response period starts on August 4, but do we get to see reviews before? Are they all released at once, or gradually as reviewers finish them? Thanks! submitted by /u/Icy_Background_4524 [link] [comments]  ( 8 min )
    [R] detrex: A Strong Benchmark for Detection Transformers
    In October of last year, we open-sourced detrex as the first unified research platform focusing on the DETR-based algorithms. After several version updates and a significant number of experiments, we conducted a detailed benchmarking of the DETR-based models supported by detrex. Our paper has already been preprinted on ArXiv: https://arxiv.org/abs/2306.07265 And our project repo is here: https://github.com/IDEA-Research/detrex The main characteristics of detrex can be summarized as follows: Fully utilizing LazyConfig as the configuration system, which is highly flexible, convenient, and easily modifiable. Reproducing results on mainstream algorithms surpassing the original implementations by more than 0.2AP - 1.1AP. Supports lots of SoTA backbone including EVA, InternImage, FocalNet, Swin-T, etc. Each algorithm is implemented as an independent project, ensuring no mutual interference between algorithms. Users can confidently implement their own ideas based on detrex. The training code is simple, consisting of approximately 200 lines, and highly customizable, allowing for easy modifications and hacks. Abundant documentation and tutorials are provided. Active response to issues and timely repository updates. ​ Here's some results in our paper, more details can be found in paper: ​ https://preview.redd.it/mdxs4xydblbb1.png?width=1007&format=png&auto=webp&s=a92acc36663365c84be8f86e74c521157fa26202 https://preview.redd.it/f4c3zkvfblbb1.png?width=726&format=png&auto=webp&s=d684591dbd9e16610f3b9b483db1190fe87c7589 https://preview.redd.it/zvcy3tajblbb1.png?width=878&format=png&auto=webp&s=03ce7eb07dbc249a03f8536ce9fad66dbd43841a ​ submitted by /u/Technical-Vast1314 [link] [comments]  ( 9 min )
    [D] Wondering if anyone does HPC or machine learning in Windows?
    With the increased productivity of vscode, does anyone do their dev work for machine learning on a windows server? I have also been getting problems running a jupyter server on ubuntu accessing my gpus. I believe it is easier on a windows server? submitted by /u/Studyr3ddit [link] [comments]  ( 8 min )
    [D] How vital is it to physically attend SIGGRAPH for my early career? Is it worth missing out on hiking the Pyrenees with my girlfriend?
    Bluntly: Is the added benefit of attending SIGGRAPH(instead of merely indulging in the Virtual Access stuff) worth not hiking the Pyrenees with my girlfriend? I am a graduate student specializing in computer graphics and AI stuff. I'm very intent on finding some cool career at the intersection of technology and art so I signed up for SIGGRAPH back in April. I've been scouring information on generative art, NeRFs, and anything on the cutting edge of graphics for a while now so this is obviously the right place. I'd be happy enough indulging in all the talks and demos, but I'm especially interested in the networking aspect and finding a way to give myself a leg up on discovering a promising career. I have been feverishly working to develop my career since I've gotten to grad school, so this…  ( 10 min )
    [D] What is the most efficient version of OpenAI Whisper?
    Hi everyone, I know that there are some different versions of Whisper available in the open-source community (Whisper X, Whisper JAX, etc.), but I'm keeping updated with the best version of the model. Specifically, I'm trying to understand the best Whisper implementation for a task to transcribe a big batch of videos (~10k videos, ~30min long). I'd like to know your thoughts on this. submitted by /u/paulo_zip [link] [comments]  ( 8 min )
    [R]If you have used Lime with bert ....help me out please
    if anyone has succesfully implemented a lime text explaintion with bert model please do share your code with me, i am trying to get lime to work for layout ML, bert and layoutML have similar architecture so i can find something new from your code to apply to mine submitted by /u/Affectionate_Win2460 [link] [comments]  ( 8 min )
    [D] 📚 The Learning Corner (Andrew NG Free Ai Courses Pt. 1)
    📚 The Learning Corner (Andrew NG Free Ai Courses Pt. 1) This is a list of some of the best Ai Free courses by Andrew NG, we will release the second part of the list on our next newsletter installment (link) Generative AI with Large Language Models LangChain: Chat With Your Data LangChain for LLM Application Development How Diffusion Models Work submitted by /u/Yavero [link] [comments]  ( 8 min )
    [D] What's the status of chart-to-text
    I've been looking into solutions to transform charts into descriptions. There were great datasets introduced last year by this paper: https://arxiv.org/pdf/2203.06486.pdf But I haven't found many updates since, especially nothing super convincing. Do any of you know more? submitted by /u/Trick_Brain [link] [comments]  ( 8 min )
    [D] What are some interesting research that combine AI with the physical world
    Most of the talk these days is about LLMs. I am curious about some daring AI projects that try to solve problems from the physical realm that humans won't be able to handle. I am talking about things beyond the human intelligence. For instance, trying to extract formulas for some physical actions, whose analytical formulas would be hard for humans to formulate but machine learning systems can better approximate. I don't know, maybe teaching small machines to fly with very few components, no hardcoded formulas and a ML system that continuously learns from trial and error. Sorry if this sounds dumb. I am just a software engineer with some interest in AI. submitted by /u/besabestin [link] [comments]  ( 8 min )
    [D] LSTM multivariate forecasting
    Hi, I'm currently working on timeseries forecasting in pairs where one timeseries is suspected to cause the other timeseries. I also have the forecast of the causing timeseries and I use them to predict the other timeseries. Note that extrapolation capabilities are required since the forecasts of the first timeseries are not always in the range of the training data. I'm trying to use a simple LSTM model for that task and I tried two ways: training a model using past values of both timeseries and the current value of the first timeseries, and training using only the past and current values of the other timeseries. To my surprise, the forecasts using only the first timeseries values without the values of the timeseries we forecast are better. Does this make sense or am I doing 100% something wrong? When scaling the data do I need to scale values of each timeseries separately or scale them together? Also, what other ways you suggest for forecasting using another timeseries data and existing forecast? submitted by /u/soundgardener666 [link] [comments]  ( 9 min )
    MongoDB for Semantic Search [D]
    I'm quite new to semantic search and looking for an easy-to-use database tool; been looking into MongoDB's semantic search capabilities and following this tutorial. Is anyone using MongoDB for semantic search? If so, what did you think? If not, was there a reason why? Do you have any recommendations that might help me pick the right tool to get started with vector/semantic search? submitted by /u/Important-Sun-3562 [link] [comments]  ( 8 min )
    [R] CorrFL: Correlation-Based Neural Network Architecture for Unavailability Concerns in a Heterogeneous IoT Environment
    An interesting article that tackles a new dimension in Federated Learning (FL) termed oblique federated learning. Link: https://ieeexplore.ieee.org/abstract/document/10132049 submitted by /u/ias18 [link] [comments]  ( 8 min )
    [Discussion] Video Translation Task
    I am working on a project related to Deep Learning translations. We have YouTube videos that we are looking to translate from English to Hindi. But we want the translated audio to sync with the mouth movements of the speaker. How to go about this task? submitted by /u/arhamm40182 [link] [comments]  ( 8 min )
    [D] What do you do with imbalanced target data in a non-binary RF classification problem?
    Hey all, I’m a relatively new data scientist working on my first serious classification project. We’re predicting our target, which is made up of four categories, using the random forest classifier from sklearn. For the target, three categories have about 250 samples and the fourth has about 1200 samples. So far, I’ve just done a stratified shuffle split around the target. Is there anything else that I should make sure to do here to account for the imbalance? Currently, my model just predicts the most common category for almost all of the predictions, which gives it a decent score but makes me feel like this isn’t a useful model. Would appreciate any advice. Thanks! submitted by /u/NDVGuy [link] [comments]  ( 8 min )
    [P] Weights and biases approach -1 or 1
    Hi, ive been desinging my own C code to make neural networks. Mainly for a project and to mess around. It works pretty well but i find that the weights and biases of the finished network are practically all 1, -1 or thereabouts. Is this normal? Again, the network works pretty well, it just seems a bit weird to me. Im using it to predict hand-written characters. Im using sigmoid as my activation function (tried RELU but didnt predict too well, maybe its because I train with a pretty small sample pool?). Weights and biases are randomly initialized with values between -1 and 1 and are capped during training so that they dont exceed -1 or 1. Im thinking this might be the issue. Problem is that if i dont cap them like this, they start to grow and grow the closer you are to the first hidden layer. The last hidden layer would have weights and biases between -1 and 1, but the others started to get bigger and bigger, so i ended up capping it like that to solve it. I believe this is common practice, but maybe the way im capping them is the problem (essentially if a weight or bias exceeds -1 or 1 after having been corrected, i equal it to -1 or 1 respectively).Im not sure what other info i should be providing, as far as i know its a pretty basic neural network, not doing anything too fancy. Thanks in advance. submitted by /u/Automatic-Syrup8490 [link] [comments]  ( 9 min )
    [R] Cross-Entropy is All You Need… Or is It?
    Hey r/machinelearning! I recently wrote an article titled "Cross-Entropy is All You Need… Or is It?" where I discuss extensions to the Cross-Entropy loss to make it better when used in noisy production datasets. I've had great results on my end using this loss, so I wanted to share it with you all and get your opinions and insights! ➡️ Article link ➡️ Code & Colab link Summary: This article introduces the Smooth Generalized Cross-Entropy (SGCE) loss function, a way to address training classification models with noisy labels while still calibrating your model's confidence scores. The article demonstrates the application of SGCE on the task of Named Entity Recognition (NER), but it is applicable to many other tasks. The loss function combines the Cross-Entropy (CE) and Mean Absolute Error…  ( 9 min )
    [D] Are NLP jobs tied to one's own native language?
    I'm considering doing a master's in NLP but concerned if NLP jobs are tied to one's own native language- for example if I'm german native I only get NLP jobs dealing with german language etc. Is this the case or can it be broader? And is it still a wise choice to get a master's in this field? submitted by /u/dauntbooksandyou [link] [comments]  ( 8 min )
    [R] Enhancing Continuous Time Series Modelling with a Latent ODE-LSTM Approach
    https://arxiv.org/abs/2307.05126 submitted by /u/cici118 [link] [comments]  ( 8 min )
    [P] langchain-lite alternative
    Although langchain is an impressive library, I tend to find it is… a little unintuitive, at least for non-trivial examples or examples that don’t have a predefined chains/templates related, it's overly prescriptive; and the various levels of abstraction don't resonate with me related, can be difficult to debug or understand what’s happening in intermediate steps of the chain or what’s it’s actually sending OpenAI So, I built a “langchain-lite” package called llm-workflow https://github.com/shane-kercheval/llm-workflow The value proposition is basically: easily build up a sequence of tasks (e.g. prompt-template -> chat) called a workflow, where the output of one task serves as the input to the next task in the workflow track history; understand what's happening in each of the …  ( 10 min )
    [P] Haven: Deploy Open Source LLMs on Your Own Cloud
    Open source LLMs are a great privacy-preserving alternative to using the ChatGPT API, but deploying them in production is still really hard - especially for people without ML experience and knowledge about ML infrastructure. With Haven, we make it possible to deploy LLMs with just a few lines of code! GitHub: https://github.com/havenhq/haven Website: https://haven.run/ Colab Demo: https://colab.research.google.com/drive/1eGGSisS9Du5-_KcaejY5y9vk9v7EIfba?authuser=2#scrollTo=YYECIKqAGId8 ​ Haven’s main component is the manager, which comes as a container image that can be deployed in your GCP environment. The manager is your central entrypoint and can be used to spin up LLMs on GPUs with just one line of code: from havenpy import Haven client = Haven(":50051", "") worker_id = client.create_inference_worker( model_name="@huggingface/mosaicml/mpt-7b-chat", quantization="float16", gpu_type="T4", gpu_count=2) ​ After spinning up inference workers, you can query them with both a chat completion and a normal completion endpoint, similar to the OpenAI API. res = client.chat_completion(worker_id, messages=[{ "content": "Who would win in a cagefight: Mark Zuckerberg or Elon Musk?", "role": "USER" }], stream=True, temperature=0.8) for r in res: print(r.text, flush=True, end="") ​ Our inference server is powered by vllm to provide the fastest inference possible. In the next weeks, we plan to add support for more cloud providers as well as fine-tuning with a single line of code. We'd love to hear your feedback! submitted by /u/jger227 [link] [comments]  ( 9 min )
  • Open

    A new AI Prompt Wars game was released
    Pretty fun actually. According to the FAQs it uses Stable Diffusion to generate images from participants' prompts and then compares them to determine a winner. submitted by /u/superander [link] [comments]  ( 8 min )
    Survey for generative AI photos and how this will affect Shutterstock and Getty
    I am trying to figure out how people have been reacting to recent changes in generative AI technology and how it will affect the artistic community. It would be greatly appreciated if as many poeple could fill out this survey attached. If you are a photographer or someone who purchases stock photos or likes to make AI images this pertains to you. Will take one minute. Thanks. https://forms.gle/NZJaEVZBfQb1uiaM9 submitted by /u/mattyb24643 [link] [comments]  ( 8 min )
    Are there any free tools that can summarize very long video transcripts?
    All the tools I’ve seen for this can only summarize up to 10ish minutes of dialog in a free version. Looking for an hour plus. I don’t mind if the writing is worse than GPT4 or whatever, would be better than nothing submitted by /u/IndependentFormal8 [link] [comments]  ( 8 min )
    Is there an AI to generate images of ideas of websites?
    I want an AI that can generate images of websites so I can develop then for personal use, is there a tool that can make that? I tried Blue Willow, Dall-e and Canva AI, none of then could generate it. submitted by /u/Luxy_Lockes [link] [comments]  ( 8 min )
    Holy Fuck...This is absolutely incredible. Achieving a better, more flexible result with 19 neurons instead of 100,000 neurons?!?
    submitted by /u/JakeYashen [link] [comments]  ( 8 min )
    ChatGPT Is Losing Users. Is The Artificial Intelligence Craze Over?
    submitted by /u/byteaw [link] [comments]  ( 8 min )
    Can you simulate ambience of a specific soundscape and its surroundings using AI?
    Hello guys, I am working on a song project for which I wanted to make an intro which resembles a port / harbor back in the 16th / 17th / 18th century. Stuff like sound of waves, harbor bells, people working and talking, carrying cargo constantly on ships. And then ultimately the sounds of a ship setting sail and leaving the harbor. ​ At first I thought I'd have to put many different royalty free sounds together to simulate this ambience but I remembered that AI is growing fast and already creating music. And that's where I wanted to ask you guys if you know if it's already possible to simulate a place using an AI and if yes, how? ​ This would really help me out. If you know other subreddits where I could ask, please feel free to suggest them. Thanks and have a good day! submitted by /u/space_dust0 [link] [comments]  ( 8 min )
    One-Minute Daily AI News 7/11/2023
    KPMG plans to spend $2 billion on AI and cloud services through an expanded partnership with Microsoft, aiming to incorporate AI into its core services. This move is in response to a slowdown in advisory deals and a challenging economic environment.[1] Elon Musk will host a conversation about AI with Rep. Ro Khanna (D-Calif.) and Rep. Mike Gallagher (R-Wis.) on Twitter Spaces Wednesday evening, a congressional aide confirmed to The Hill. Gallagher and Khanna have in the past stressed the need for balance in the technology, both expressing optimism about potential benefits while also sharing concerns about the potential dangers it can pose.[2] IT major Wipro announced the launch of the ai360 service and plans to invest $1 billion in AI over the next three years. The move follows Tata Consultancy Services’ announcement to train 25,000 engineers on generative AI tools.[3] IBM is considering the use of artificial intelligence chips that it designed in-house to lower the costs of operating a cloud computing service it made widely available this week, an executive said Tuesday.[4] Sources: [1] https://www.livemint.com/companies/news/kpmg-to-enter-into-a-deal-with-microsoft-spend-2-billion-in-ai-and-cloud-services-11689122323635.html [2] https://thehill.com/homenews/4092145-elon-musk-to-talk-ai-with-bipartisan-pair-of-lawmakers/ [3] https://www.livemint.com/companies/news/wipro-launches-ai360-will-invest-1-billion-into-ai-the-next-three-years-11689132228044.html [4] https://www.reuters.com/technology/ibm-mulls-using-its-own-ai-chip-new-cloud-service-lower-costs-2023-07-11/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
  • Open

    Faster training of Dueling DQN
    Hi everyone, Has anyone worked with training dueling DQN faster or sample efficient? I found papers: Fast RL with slow updates: https://github.com/amazon-science/fast-rl-with-slow-updates Sample efficient deep reinforcement learning via uncertainty estimation: https://openreview.net/forum?id=vrW3tvDfOJQ Sample efficient deep reinforcement learning via episodic backward update: https://arxiv.org/abs/1805.12375 Has anyone worked with any of these? Or do you know any other strategies that have been used to make Dueling DQN learn faster? (I have already experimented with RAINBOW, so wanted to try something on top of it) submitted by /u/mr_formaldehyde [link] [comments]  ( 8 min )
    The inverse reward of the same MDP gives a different result when using value iteration
    Hello, I have an MDP which exists of 2 machine and I need to make decisions on when to do maintenance on the machine depending on the quality of the production. In one situation I created a reward structure based on the production loss of the system. and in the other situation I created a reward structure based on the throughput of the system which is exactly the inverse of the production loss, as you can see in the figure below. So I should suppose that the result of the value iteration algorithm should be exactly the same but it is not. Does anyone know what the reason for that could be or what I can try to do to find out why this happens? Because in value iteration the solution should be optimal, so 2 optimal solutions are not possible. It would be really helpful if someone has an idea about this. ​ https://preview.redd.it/ni23yr5pfjbb1.png?width=1590&format=png&auto=webp&s=5550afef24552b5f516c15aacfc6f0d37af6fdd7 ​ https://preview.redd.it/dundnsrsfjbb1.png?width=453&format=png&auto=webp&s=20d300946254f1d08f3404100a565db27cb5658f submitted by /u/IcyWatch9445 [link] [comments]  ( 9 min )
  • Open

    Generative AI imagines new protein structures
    MIT researchers develop "FrameDiff," a computational tool that uses generative AI to craft new protein structures, with the aim of accelerating drug development and improving gene therapy.  ( 9 min )
  • Open

    Need help for an exam preparation task
    This is the given task. I have never seen a neural network displayed like this. What do the Upwards Arrows in the X1 input Neuron and the upper Neuron in the hidden layer mean? What are the values at the hidden layer (e.g "3.456"), is this the threshold? If yes would I then continue with 1 and multiply 1 with the next weight (8.805 in that case) ? Problem is all my friends don't know it as well and we do not know who else to ask, so we ask reddit.. https://preview.redd.it/xgq4luwnakbb1.png?width=882&format=png&auto=webp&s=d032255a5296a5451eaa7c6e5821dd2981390086 submitted by /u/SoSickBR [link] [comments]  ( 8 min )
    Weights and biases approach -1 or 1
    Hi, ive been desinging my own C code to make neural networks. Mainly for a project and to mess around. It works pretty well but i find that the weights and biases of the finished network are practically all 1, -1 or thereabouts. Is this normal? Again, the network works pretty well, it just seems a bit weird to me. Im using it to predict hand-written characters. Im using sigmoid as my activation function (tried RELU but didnt predict too well, maybe its because I trian with a pretty small sample pool?). Weights and biases are randomly initialized with values between -1 and 1 and are capped during training so that they dont exceed -1 or 1. Im thinking this might be the issue. Problem is that if i dont cap them like this, they start to grow and grow the closer you are to the first hidden layer. The last hidden layer would have weights and biases between -1 and 1, but the others started to get bigger and bigger, so i ended up capping it like that to solve it. I believe this is common practice, but maybe the way im capping them is the problem (essentially if a weight or bias exceeds -1 or 1 after having been corrected, i equal it to -1 or 1 respectively).Im not sure what other info i should be providing, as far as i know its a pretty basic neural network, not doing anything too fancy. Thanks in advance. submitted by /u/Automatic-Syrup8490 [link] [comments]  ( 9 min )
  • Open

    How to mark a language in HTML
    In HTML you can mark the language of a piece of text by putting it inside span tags and setting the lang attribute to a two-letter abbreviation. For example, Allons enfants de la Patrie, Le jour de gloire est arrivé ! indicates that the first two lines of the French national anthem are in […] How to mark a language in HTML first appeared on John D. Cook.  ( 5 min )
  • Open

    Thinking beyond audio: Augmenting headphones for everyday digital interactions
    Because headphones rank among the most popular wearables in the market, we have an exciting opportunity to expand their capabilities through integrating existing sensors with supplementary ones to enable a wide variety of experiences that go beyond traditional audio control. The post Thinking beyond audio: Augmenting headphones for everyday digital interactions appeared first on Microsoft Research.  ( 12 min )
  • Open

    Score! Team NVIDIA Takes Trophy in Recommendation Systems
    A crack NVIDIA team of five machine learning experts spread across four continents won all three tasks in a hotly contested, prestigious competition to build state-of-the-art recommendation systems. The results reflect the group’s savvy applying the NVIDIA AI platform to real-world challenges for these engines of the digital economy. Recommenders serve up trillions of search Read article >  ( 6 min )
    MosaicML Helps AI Users Boost Accuracy, Cut Costs and Save Time
    Startup MosaicML is on a mission to help the AI community improve prediction accuracy, decrease costs and save time by providing tools for easy training and deployment of large AI models. In this episode of NVIDIA’s AI Podcast, host Noah Kravitz speaks with MosaicML CEO and co-founder Naveen Rao about how the company aims to Read article >  ( 5 min )
  • Open

    Isotuning With Applications To Scale-Free Online Learning. (arXiv:2112.14586v2 [cs.LG] UPDATED)
    We extend and combine several tools of the literature to design fast, adaptive, anytime and scale-free online learning algorithms. Scale-free regret bounds must scale linearly with the maximum loss, both toward large losses and toward very small losses. Adaptive regret bounds demonstrate that an algorithm can take advantage of easy data and potentially have constant regret. We seek to develop fast algorithms that depend on as few parameters as possible, in particular they should be anytime and thus not depend on the time horizon. Our first and main tool, isotuning, is a generalization of the idea of balancing the trade-off of the regret. We develop a set of tools to design and analyze such learning rates easily and show that they adapts automatically to the rate of the regret (whether constant, $O(\log T)$, $O(\sqrt{T})$, etc.) within a factor 2 of the optimal learning rate in hindsight for the same observed quantities. The second tool is an online correction, which allows us to obtain centered bounds for many algorithms, to prevent the regret bounds from being vacuous when the domain is overly large or only partially constrained. The last tool, null updates, prevents the algorithm from performing overly large updates, which could result in unbounded regret, or even invalid updates. We develop a general theory using these tools and apply it to several standard algorithms. In particular, we (almost entirely) restore the adaptivity to small losses of FTRL for unbounded domains, design and prove scale-free adaptive guarantees for a variant of Mirror Descent (at least when the Bregman divergence is convex in its second argument), extend Adapt-ML-Prod to scale-free guarantees, and provide several other minor contributions about Prod, AdaHedge, BOA and Soft-Bayes.  ( 3 min )
    RELDEC: Reinforcement Learning-Based Decoding of Moderate Length LDPC Codes. (arXiv:2112.13934v2 [cs.IT] UPDATED)
    In this work we propose RELDEC, a novel approach for sequential decoding of moderate length low-density parity-check (LDPC) codes. The main idea behind RELDEC is that an optimized decoding policy is subsequently obtained via reinforcement learning based on a Markov decision process (MDP). In contrast to our previous work, where an agent learns to schedule only a single check node (CN) within a group (cluster) of CNs per iteration, in this work we train the agent to schedule all CNs in a cluster, and all clusters in every iteration. That is, in each learning step of RELDEC an agent learns to schedule CN clusters sequentially depending on a reward associated with the outcome of scheduling a particular cluster. We also modify the state space representation of the MDP, enabling RELDEC to be suitable for larger block length LDPC codes than those studied in our previous work. Furthermore, to address decoding under varying channel conditions, we propose agile meta-RELDEC (AM-RELDEC) that employs meta-reinforcement learning. The proposed RELDEC scheme significantly outperforms standard flooding and random sequential decoding for a variety of LDPC codes, including codes designed for 5G new radio.  ( 2 min )
    CT-based Subchondral Bone Microstructural Analysis in Knee Osteoarthritis via MR-Guided Distillation Learning. (arXiv:2307.04390v2 [eess.IV] UPDATED)
    Background: MR-based subchondral bone effectively predicts knee osteoarthritis. However, its clinical application is limited by the cost and time of MR. Purpose: We aim to develop a novel distillation-learning-based method named SRRD for subchondral bone microstructural analysis using easily-acquired CT images, which leverages paired MR images to enhance the CT-based analysis model during training. Materials and Methods: Knee joint images of both CT and MR modalities were collected from October 2020 to May 2021. Firstly, we developed a GAN-based generative model to transform MR images into CT images, which was used to establish the anatomical correspondence between the two modalities. Next, we obtained numerous patches of subchondral bone regions of MR images, together with their trabecular parameters (BV / TV, Tb. Th, Tb. Sp, Tb. N) from the corresponding CT image patches via regression. The distillation-learning technique was used to train the regression model and transfer MR structural information to the CT-based model. The regressed trabecular parameters were further used for knee osteoarthritis classification. Results: A total of 80 participants were evaluated. CT-based regression results of trabecular parameters achieved intra-class correlation coefficients (ICCs) of 0.804, 0.773, 0.711, and 0.622 for BV / TV, Tb. Th, Tb. Sp, and Tb. N, respectively. The use of distillation learning significantly improved the performance of the CT-based knee osteoarthritis classification method using the CNN approach, yielding an AUC score of 0.767 (95% CI, 0.681-0.853) instead of 0.658 (95% CI, 0.574-0.742) (p<.001). Conclusions: The proposed SRRD method showed high reliability and validity in MR-CT registration, regression, and knee osteoarthritis classification, indicating the feasibility of subchondral bone microstructural analysis based on CT images.  ( 3 min )
    Hyper-parameter Tuning for Adversarially Robust Models. (arXiv:2304.02497v2 [cs.LG] UPDATED)
    This work focuses on the problem of hyper-parameter tuning (HPT) for robust (i.e., adversarially trained) models, shedding light on the new challenges and opportunities arising during the HPT process for robust models. To this end, we conduct an extensive experimental study based on 3 popular deep models, in which we explore exhaustively 9 (discretized) HPs, 2 fidelity dimensions, and 2 attack bounds, for a total of 19208 configurations (corresponding to 50 thousand GPU hours). Through this study, we show that the complexity of the HPT problem is further exacerbated in adversarial settings due to the need to independently tune the HPs used during standard and adversarial training: succeeding in doing so (i.e., adopting different HP settings in both phases) can lead to a reduction of up to 80% and 43% of the error for clean and adversarial inputs, respectively. On the other hand, we also identify new opportunities to reduce the cost of HPT for robust models. Specifically, we propose to leverage cheap adversarial training methods to obtain inexpensive, yet highly correlated, estimations of the quality achievable using state-of-the-art methods. We show that, by exploiting this novel idea in conjunction with a recent multi-fidelity optimizer (taKG), the efficiency of the HPT process can be enhanced by up to 2.1x.  ( 2 min )
    Gait Characterization in Duchenne Muscular Dystrophy (DMD) Using a Single-Sensor Accelerometer: Classical Machine Learning and Deep Learning Approaches. (arXiv:2105.06295v3 [eess.SP] UPDATED)
    Differences in gait patterns of children with Duchenne muscular dystrophy (DMD) and typically-developing (TD) peers are visible to the eye, but quantifications of those differences outside of the gait laboratory have been elusive. In this work, we measured vertical, mediolateral, and anteroposterior acceleration using a waist-worn iPhone accelerometer during ambulation across a typical range of velocities. Fifteen TD and fifteen DMD children from 3-16 years of age underwent eight walking/running activities, including five 25 meters walk/run speed-calibration tests at a slow walk to running speeds (SC-L1 to SC-L5), a 6-minute walk test (6MWT), a 100 meters fast-walk/jog/run (100MRW), and a free walk (FW). For clinical anchoring purposes, participants completed a Northstar Ambulatory Assessment (NSAA). We extracted temporospatial gait clinical features (CFs) and applied multiple machine learning (ML) approaches to differentiate between DMD and TD children using extracted temporospatial gait CFs and raw data. Extracted temporospatial gait CFs showed reduced step length and a greater mediolateral component of total power (TP) consistent with shorter strides and Trendelenberg-like gait commonly observed in DMD. ML approaches using temporospatial gait CFs and raw data varied in effectiveness at differentiating between DMD and TD controls at different speeds, with an accuracy of up to 100%. We demonstrate that by using ML with accelerometer data from a consumer-grade smartphone, we can capture DMD-associated gait characteristics in toddlers to teens.  ( 3 min )
    Directed Diffusion: Direct Control of Object Placement through Attention Guidance. (arXiv:2302.13153v2 [cs.CV] UPDATED)
    Text-guided diffusion models such as DALLE-2, Imagen, and Stable Diffusion are able to generate an effectively endless variety of images given only a short text prompt describing the desired image content. In many cases the images are of very high quality. However, these models often struggle to compose scenes containing several key objects such as characters in specified positional relationships. The missing capability to "direct" the placement of characters and objects both within and across images is crucial in storytelling, as recognized in the literature on film and animation theory. In this work, we take a particularly straightforward approach to providing the needed direction. Drawing on the observation that the cross-attention maps for prompt words reflect the spatial layout of objects denoted by those words, we introduce an optimization objective that produces ``activation'' at desired positions in these cross-attention maps. The resulting approach is a step toward generalizing the applicability of text-guided diffusion models beyond single images to collections of related images, as in storybooks. To the best of our knowledge, our Directed Diffusion method is the first diffusion technique that provides positional control over multiple objects, while making use of an existing pre-trained model and maintaining a coherent blend between the positioned objects and the background. Moreover, it requires only a few lines to implement.  ( 3 min )
    Single-Model Attribution of Generative Models Through Final-Layer Inversion. (arXiv:2306.06210v2 [cs.CV] UPDATED)
    Recent groundbreaking developments on generative modeling have sparked interest in practical single-model attribution. Such methods predict whether a sample was generated by a specific generator or not, for instance, to prove intellectual property theft. However, previous works are either limited to the closed-world setting or require undesirable changes of the generative model. We address these shortcomings by proposing FLIPAD, a new approach for single-model attribution in the open-world setting based on final-layer inversion and anomaly detection. We show that the utilized final-layer inversion can be reduced to a convex lasso optimization problem, making our approach theoretically sound and computationally efficient. The theoretical findings are accompanied by an experimental study demonstrating the effectiveness of our approach, outperforming the existing methods.  ( 2 min )
    Distilling BlackBox to Interpretable models for Efficient Transfer Learning. (arXiv:2305.17303v7 [cs.CV] UPDATED)
    Building generalizable AI models is one of the primary challenges in the healthcare domain. While radiologists rely on generalizable descriptive rules of abnormality, Neural Network (NN) models suffer even with a slight shift in input distribution (e.g., scanner type). Fine-tuning a model to transfer knowledge from one domain to another requires a significant amount of labeled data in the target domain. In this paper, we develop an interpretable model that can be efficiently fine-tuned to an unseen target domain with minimal computational cost. We assume the interpretable component of NN to be approximately domain-invariant. However, interpretable models typically underperform compared to their Blackbox (BB) variants. We start with a BB in the source domain and distill it into a \emph{mixture} of shallow interpretable models using human-understandable concepts. As each interpretable model covers a subset of data, a mixture of interpretable models achieves comparable performance as BB. Further, we use the pseudo-labeling technique from semi-supervised learning (SSL) to learn the concept classifier in the target domain, followed by fine-tuning the interpretable models in the target domain. We evaluate our model using a real-life large-scale chest-X-ray (CXR) classification dataset. The code is available at: \url{https://github.com/batmanlab/MICCAI-2023-Route-interpret-repeat-CXRs}.  ( 3 min )
    Improving Code Example Recommendations on Informal Documentation Using BERT and Query-Aware LSH: A Comparative Study. (arXiv:2305.03017v2 [cs.SE] UPDATED)
    Our research investigates the recommendation of code examples to aid software developers, a practice that saves developers significant time by providing ready-to-use code snippets. The focus of our study is Stack Overflow, a commonly used resource for coding discussions and solutions, particularly in the context of the Java programming language. We applied BERT, a powerful Large Language Model (LLM) that enables us to transform code examples into numerical vectors by extracting their semantic information. Once these numerical representations are prepared, we identify Approximate Nearest Neighbors (ANN) using Locality-Sensitive Hashing (LSH). Our research employed two variants of LSH: Random Hyperplane-based LSH and Query-Aware LSH. We rigorously compared these two approaches across four parameters: HitRate, Mean Reciprocal Rank (MRR), Average Execution Time, and Relevance. Our study revealed that the Query-Aware (QA) approach showed superior performance over the Random Hyperplane-based (RH) method. Specifically, it exhibited a notable improvement of 20% to 35% in HitRate for query pairs compared to the RH approach. Furthermore, the QA approach proved significantly more time-efficient, with its speed in creating hashing tables and assigning data samples to buckets being at least four times faster. It can return code examples within milliseconds, whereas the RH approach typically requires several seconds to recommend code examples. Due to the superior performance of the QA approach, we tested it against PostFinder and FaCoY, the state-of-the-art baselines. Our QA method showed comparable efficiency proving its potential for effective code recommendation.  ( 3 min )
    A Survey on Explainable Anomaly Detection. (arXiv:2210.06959v2 [cs.LG] UPDATED)
    In the past two decades, most research on anomaly detection has focused on improving the accuracy of the detection, while largely ignoring the explainability of the corresponding methods and thus leaving the explanation of outcomes to practitioners. As anomaly detection algorithms are increasingly used in safety-critical domains, providing explanations for the high-stakes decisions made in those domains has become an ethical and regulatory requirement. Therefore, this work provides a comprehensive and structured survey on state-of-the-art explainable anomaly detection techniques. We propose a taxonomy based on the main aspects that characterize each explainable anomaly detection technique, aiming to help practitioners and researchers find the explainable anomaly detection method that best suits their needs.  ( 2 min )
    Successive Affine Learning for Deep Neural Networks. (arXiv:2305.07996v2 [cs.LG] UPDATED)
    This paper introduces a successive affine learning (SAL) model for constructing deep neural networks (DNNs). Traditionally, a DNN is built by solving a non-convex optimization problem. It is often challenging to solve such a problem numerically due to its non-convexity and having a large number of layers. To address this challenge, inspired by the human education system, the multi-grade deep learning (MGDL) model was recently initiated by the author of this paper. The MGDL model learns a DNN in several grades, in each of which one constructs a shallow DNN consisting of a relatively small number of layers. The MGDL model still requires solving several non-convex optimization problems. The proposed SAL model mutates from the MGDL model. Noting that each layer of a DNN consists of an affine map followed by an activation function, we propose to learn the affine map by solving a quadratic/convex optimization problem which involves the activation function only {\it after} the weight matrix and the bias vector for the current layer have been trained. In the context of function approximation, for a given function the SAL model generates an expansion of the function with adaptive basis functions in the form of DNNs. We establish the Pythagorean identity and the Parseval identity for the system generated by the SAL model. Moreover, we provide a convergence theorem of the SAL process in the sense that either it terminates after a finite number of grades or the norms of its optimal error functions strictly decrease to a limit as the grade number increases to infinity. Furthermore, we present numerical examples of proof of concept which demonstrate that the proposed SAL model significantly outperforms the traditional deep learning model.  ( 3 min )
    Exploring Image Augmentations for Siamese Representation Learning with Chest X-Rays. (arXiv:2301.12636v2 [eess.IV] UPDATED)
    Image augmentations are quintessential for effective visual representation learning across self-supervised learning techniques. While augmentation strategies for natural imaging have been studied extensively, medical images are vastly different from their natural counterparts. Thus, it is unknown whether common augmentation strategies employed in Siamese representation learning generalize to medical images and to what extent. To address this challenge, in this study, we systematically assess the effect of various augmentations on the quality and robustness of the learned representations. We train and evaluate Siamese Networks for abnormality detection on chest X-Rays across three large datasets (MIMIC-CXR, CheXpert and VinDR-CXR). We investigate the efficacy of the learned representations through experiments involving linear probing, fine-tuning, zero-shot transfer, and data efficiency. Finally, we identify a set of augmentations that yield robust representations that generalize well to both out-of-distribution data and diseases, while outperforming supervised baselines using just zero-shot transfer and linear probes by up to 20%. Our code is available at https://github.com/StanfordMIMI/siaug.  ( 2 min )
    Generative Pretrained Autoregressive Transformer Graph Neural Network applied to the Analysis and Discovery of Novel Proteins. (arXiv:2305.04934v2 [q-bio.BM] UPDATED)
    We report a flexible language-model based deep learning strategy, applied here to solve complex forward and inverse problems in protein modeling, based on an attention neural network that integrates transformer and graph convolutional architectures in a causal multi-headed graph mechanism, to realize a generative pretrained model. The model is applied to predict secondary structure content (per-residue level and overall content), protein solubility, and sequencing tasks. Further trained on inverse tasks, the model is rendered capable of designing proteins with these properties as target features. The model is formulated as a general framework, completely prompt-based, and can be adapted for a variety of downstream tasks. We find that adding additional tasks yields emergent synergies that the model exploits in improving overall performance, beyond what would be possible by training a model on each dataset alone. Case studies are presented to validate the method, yielding protein designs specifically focused on structural proteins, but also exploring the applicability in the design of soluble, antimicrobial biomaterials. While our model is trained to ultimately perform 8 distinct tasks, with available datasets it can be extended to solve additional problems. In a broader sense, this work illustrates a form of multiscale modeling that relates a set of ultimate building blocks (here, byte-level utf8 characters that define the nature of the physical system at hand) to complex output. This materiomic scheme captures complex emergent relationships between universal building block and resulting properties via a synergizing learning capacity to express a set of potentialities embedded in the knowledge used in training, via the interplay of universality and diversity.  ( 3 min )
    Prediction intervals for neural network models using weighted asymmetric loss functions. (arXiv:2210.04318v4 [stat.ML] UPDATED)
    We propose a simple and efficient approach to generate a prediction intervals (PI) for approximated and forecasted trends. Our method leverages a weighted asymmetric loss function to estimate the lower and upper bounds of the PI, with the weights determined by its coverage probability. We provide a concise mathematical proof of the method, show how it can be extended to derive PIs for parametrised functions and argue why the method works for predicting PIs of dependent variables. The presented tests of the method on a real-world forecasting task using a neural network-based model show that it can produce reliable PIs in complex machine learning scenarios.  ( 2 min )
    QI2 -- an Interactive Tool for Data Quality Assurance. (arXiv:2307.03419v2 [cs.CY] CROSS LISTED)
    The importance of high data quality is increasing with the growing impact and distribution of ML systems and big data. Also the planned AI Act from the European commission defines challenging legal requirements for data quality especially for the market introduction of safety relevant ML systems. In this paper we introduce a novel approach that supports the data quality assurance process of multiple data quality aspects. This approach enables the verification of quantitative data quality requirements. The concept and benefits are introduced and explained on small example data sets. How the method is applied is demonstrated on the well known MNIST data set based an handwritten digits.  ( 2 min )
    Hybrid quantum-classical machine learning for generative chemistry and drug design. (arXiv:2108.11644v2 [quant-ph] UPDATED)
    Deep generative chemistry models emerge as powerful tools to expedite drug discovery. However, the immense size and complexity of the structural space of all possible drug-like molecules pose significant obstacles, which could be overcome with hybrid architectures combining quantum computers with deep classical networks. As the first step toward this goal, we built a compact discrete variational autoencoder (DVAE) with a Restricted Boltzmann Machine (RBM) of reduced size in its latent layer. The size of the proposed model was small enough to fit on a state-of-the-art D-Wave quantum annealer and allowed training on a subset of the ChEMBL dataset of biologically active compounds. Finally, we generated 2331 novel chemical structures with medicinal chemistry and synthetic accessibility properties in the ranges typical for molecules from ChEMBL. The presented results demonstrate the feasibility of using already existing or soon-to-be-available quantum computing devices as testbeds for future drug discovery applications.  ( 2 min )
    Solving PDEs with Unmeasurable Source Terms Using Coupled Physics-Informed Neural Network with Recurrent Prediction for Soft Sensors. (arXiv:2301.08618v3 [cs.LG] UPDATED)
    Partial differential equations (PDEs) are a model candidate for soft sensors in industrial processes with spatiotemporal dependence. Although physics-informed neural networks (PINNs) are a promising machine learning method for solving PDEs, they are infeasible for the nonhomogeneous PDEs with unmeasurable source terms. To this end, a coupled PINN (CPINN) with a recurrent prediction (RP) learning strategy (CPINN- RP) is proposed. First, CPINN composed of NetU and NetG is proposed. NetU is for approximating PDEs solutions and NetG is for regularizing the training of NetU. The two networks are integrated into a data-physics-hybrid loss function. Then, we theoretically prove that the proposed CPINN has a satisfying approximation capability for solutions to nonhomogeneous PDEs with unmeasurable source terms. Besides the theoretical aspects, we propose a hierarchical training strategy to optimize and couple NetU and NetG. Secondly, NetU-RP is proposed for compensating information loss in data sampling to improve the prediction performance, in which RP is the recurrently delayed outputs of well-trained CPINN and hard sensors. Finally, the artificial and practical datasets are used to verify the feasibility and effectiveness of CPINN-RP for soft sensors.  ( 3 min )
    Generalization Error of First-Order Methods for Statistical Learning with Generic Oracles. (arXiv:2307.04679v2 [cs.LG] UPDATED)
    In this paper, we provide a novel framework for the analysis of generalization error of first-order optimization algorithms for statistical learning when the gradient can only be accessed through partial observations given by an oracle. Our analysis relies on the regularity of the gradient w.r.t. the data samples, and allows to derive near matching upper and lower bounds for the generalization error of multiple learning problems, including supervised learning, transfer learning, robust learning, distributed learning and communication efficient learning using gradient quantization. These results hold for smooth and strongly-convex optimization problems, as well as smooth non-convex optimization problems verifying a Polyak-Lojasiewicz assumption. In particular, our upper and lower bounds depend on a novel quantity that extends the notion of conditional standard deviation, and is a measure of the extent to which the gradient can be approximated by having access to the oracle. As a consequence, our analysis provides a precise meaning to the intuition that optimization of the statistical learning objective is as hard as the estimation of its gradient. Finally, we show that, in the case of standard supervised learning, mini-batch gradient descent with increasing batch sizes and a warm start can reach a generalization error that is optimal up to a multiplicative factor, thus motivating the use of this optimization scheme in practical applications.  ( 3 min )
    Temporal Conditioning Spiking Latent Variable Models of the Neural Response to Natural Visual Scenes. (arXiv:2306.12045v2 [q-bio.NC] UPDATED)
    Developing computational models of neural response is crucial for understanding sensory processing and neural computations. Current state-of-the-art neural network methods use temporal filters to handle temporal dependencies, resulting in an unrealistic and inflexible processing flow. Meanwhile, these methods target trial-averaged firing rates and fail to capture important features in spike trains. This work presents the temporal conditioning spiking latent variable models (TeCoS-LVM) to simulate the neural response to natural visual stimuli. We use spiking neurons to produce spike outputs that directly match the recorded trains. This approach helps to avoid losing information embedded in the original spike trains. We exclude the temporal dimension from the model parameter space and introduce a temporal conditioning operation to allow the model to adaptively explore and exploit temporal dependencies in stimuli sequences in a natural paradigm. We show that TeCoS-LVM models can produce more realistic spike activities and accurately fit spike statistics than powerful alternatives. Additionally, learned TeCoS-LVM models can generalize well to longer time scales. Overall, while remaining computationally tractable, our model effectively captures key features of neural coding systems. It thus provides a useful tool for building accurate predictive computational accounts for various sensory perception circuits.  ( 3 min )
    I2I: Initializing Adapters with Improvised Knowledge. (arXiv:2304.02168v2 [cs.CL] UPDATED)
    Adapters present a promising solution to the catastrophic forgetting problem in continual learning. However, training independent Adapter modules for every new task misses an opportunity for cross-task knowledge transfer. We propose Improvise to Initialize (I2I), a continual learning algorithm that initializes Adapters for incoming tasks by distilling knowledge from previously-learned tasks' Adapters. We evaluate I2I on CLiMB, a multimodal continual learning benchmark, by conducting experiments on sequences of visual question answering tasks. Adapters trained with I2I consistently achieve better task accuracy than independently-trained Adapters, demonstrating that our algorithm facilitates knowledge transfer between task Adapters. I2I also results in better cross-task knowledge transfer than the state-of-the-art AdapterFusion without incurring the associated parametric cost.  ( 2 min )
    High Dimensional Quantum Machine Learning With Small Quantum Computers. (arXiv:2203.13739v3 [quant-ph] UPDATED)
    Quantum computers hold great promise to enhance machine learning, but their current qubit counts restrict the realisation of this promise. In an attempt to placate this limitation techniques can be applied for evaluating a quantum circuit using a machine with fewer qubits than the circuit naively requires. These techniques work by evaluating many smaller circuits on the smaller machine, that are then combined in a polynomial to replicate the output of the larger machine. This scheme requires more circuit evaluations than are practical for general circuits. However, we investigate the possibility that for certain applications many of these subcircuits are superfluous, and that a much smaller sum is sufficient to estimate the full circuit. We construct a machine learning model that may be capable of approximating the outputs of the larger circuit with much fewer circuit evaluations. We successfully apply our model to the task of digit recognition, using simulated quantum computers much smaller than the data dimension. The model is also applied to the task of approximating a random 10 qubit PQC with simulated access to a 5 qubit computer, even with only relatively modest number of circuits our model provides an accurate approximation of the 10 qubit PQCs output, superior to a neural network attempt. The developed method might be useful for implementing quantum models on larger data throughout the NISQ era.
    Protecting the Future: Neonatal Seizure Detection with Spatial-Temporal Modeling. (arXiv:2307.05382v1 [eess.SP])
    A timely detection of seizures for newborn infants with electroencephalogram (EEG) has been a common yet life-saving practice in the Neonatal Intensive Care Unit (NICU). However, it requires great human efforts for real-time monitoring, which calls for automated solutions to neonatal seizure detection. Moreover, the current automated methods focusing on adult epilepsy monitoring often fail due to (i) dynamic seizure onset location in human brains; (ii) different montages on neonates and (iii) huge distribution shift among different subjects. In this paper, we propose a deep learning framework, namely STATENet, to address the exclusive challenges with exquisite designs at the temporal, spatial and model levels. The experiments over the real-world large-scale neonatal EEG dataset illustrate that our framework achieves significantly better seizure detection performance.
    Responsive parallelized architecture for deploying deep learning models in production environments. (arXiv:2112.08933v2 [cs.LG] UPDATED)
    Recruiters can easily shortlist candidates for jobs via viewing their curriculum vitae (CV) document. Unstructured document CV beholds candidate's portfolio and named entities listing details. The main aim of this study is to design and propose a web oriented, highly responsive, computational pipeline that systematically predicts CV entities using hierarchically-refined label attention networks. Deep learning models specialized for named entity recognition were trained on large dataset to predict relevant fields. The article suggests an optimal strategy to use a number of deep learning models in parallel and predict in real time. We demonstrate selection of light weight micro web framework using Analytical Hierarchy Processing algorithm and focus on an approach useful to deploy large deep learning model-based pipelines in production ready environments using microservices. Deployed models and architecture proposed helped in parsing normal CV in less than 700 milliseconds for sequential flow of requests.
    AdaptiveRec: Adaptively Construct Pairs for Contrastive Learning in Sequential Recommendation. (arXiv:2307.05469v1 [cs.IR])
    This paper presents a solution to the challenges faced by contrastive learning in sequential recommendation systems. In particular, it addresses the issue of false negative, which limits the effectiveness of recommendation algorithms. By introducing an advanced approach to contrastive learning, the proposed method improves the quality of item embeddings and mitigates the problem of falsely categorizing similar instances as dissimilar. Experimental results demonstrate performance enhancements compared to existing systems. The flexibility and applicability of the proposed approach across various recommendation scenarios further highlight its value in enhancing sequential recommendation systems.
    Mitigating the Accuracy-Robustness Trade-off via Multi-Teacher Adversarial Distillation. (arXiv:2306.16170v2 [cs.LG] UPDATED)
    Adversarial training is a practical approach for improving the robustness of deep neural networks against adversarial attacks. Although bringing reliable robustness, the performance toward clean examples is negatively affected after adversarial training, which means a trade-off exists between accuracy and robustness. Recently, some studies have tried to use knowledge distillation methods in adversarial training, achieving competitive performance in improving the robustness but the accuracy for clean samples is still limited. In this paper, to mitigate the accuracy-robustness trade-off, we introduce the Multi-Teacher Adversarial Robustness Distillation (MTARD) to guide the model's adversarial training process by applying a strong clean teacher and a strong robust teacher to handle the clean examples and adversarial examples, respectively. During the optimization process, to ensure that different teachers show similar knowledge scales, we design the Entropy-Based Balance algorithm to adjust the teacher's temperature and keep the teachers' information entropy consistent. Besides, to ensure that the student has a relatively consistent learning speed from multiple teachers, we propose the Normalization Loss Balance algorithm to adjust the learning weights of different types of knowledge. A series of experiments conducted on public datasets demonstrate that MTARD outperforms the state-of-the-art adversarial training and distillation methods against various adversarial attacks.
    Continual Learning on Dynamic Graphs via Parameter Isolation. (arXiv:2305.13825v2 [cs.LG] UPDATED)
    Many real-world graph learning tasks require handling dynamic graphs where new nodes and edges emerge. Dynamic graph learning methods commonly suffer from the catastrophic forgetting problem, where knowledge learned for previous graphs is overwritten by updates for new graphs. To alleviate the problem, continual graph learning methods are proposed. However, existing continual graph learning methods aim to learn new patterns and maintain old ones with the same set of parameters of fixed size, and thus face a fundamental tradeoff between both goals. In this paper, we propose Parameter Isolation GNN (PI-GNN) for continual learning on dynamic graphs that circumvents the tradeoff via parameter isolation and expansion. Our motivation lies in that different parameters contribute to learning different graph patterns. Based on the idea, we expand model parameters to continually learn emerging graph patterns. Meanwhile, to effectively preserve knowledge for unaffected patterns, we find parameters that correspond to them via optimization and freeze them to prevent them from being rewritten. Experiments on eight real-world datasets corroborate the effectiveness of PI-GNN compared to state-of-the-art baselines.
    Defining data science: a new field of inquiry. (arXiv:2306.16177v2 [cs.LG] UPDATED)
    Data science is not a science. It is a research paradigm. Its power, scope, and scale will surpass science, our most powerful research paradigm, to enable knowledge discovery and change our world. We have yet to understand and define it, vital to realizing its potential and managing its risks. Modern data science is in its infancy. Emerging slowly since 1962 and rapidly since 2000, it is a fundamentally new field of inquiry, one of the most active, powerful, and rapidly evolving 21st century innovations. Due to its value, power, and applicability, it is emerging in 40+ disciplines, hundreds of research areas, and thousands of applications. Millions of data science publications contain myriad definitions of data science and data science problem solving. Due to its infancy, many definitions are independent, application-specific, mutually incomplete, redundant, or inconsistent, hence so is data science. This research addresses this data science multiple definitions challenge by proposing the development of coherent, unified definition based on a data science reference framework using a data science journal for the data science community to achieve such a definition. This paper provides candidate definitions for essential data science artifacts that are required to discuss such a definition. They are based on the classical research paradigm concept consisting of a philosophy of data science, the data science problem solving paradigm, and the six component data science reference framework (axiology, ontology, epistemology, methodology, methods, technology) that is a frequently called for unifying framework with which to define, unify, and evolve data science. It presents challenges for defining data science, solution approaches, i.e., means for defining data science, and their requirements and benefits as the basis of a comprehensive solution.
    Randomized Exploration in Generalized Linear Bandits. (arXiv:1906.08947v3 [cs.LG] UPDATED)
    We study two randomized algorithms for generalized linear bandits. The first, GLM-TSL, samples a generalized linear model (GLM) from the Laplace approximation to the posterior distribution. The second, GLM-FPL, fits a GLM to a randomly perturbed history of past rewards. We analyze both algorithms and derive $\tilde{O}(d \sqrt{n \log K})$ upper bounds on their $n$-round regret, where $d$ is the number of features and $K$ is the number of arms. The former improves on prior work while the latter is the first for Gaussian noise perturbations in non-linear models. We empirically evaluate both GLM-TSL and GLM-FPL in logistic bandits, and apply GLM-FPL to neural network bandits. Our work showcases the role of randomization, beyond posterior sampling, in exploration.
    Automating Augmentation Through Random Unidimensional Search. (arXiv:2106.08756v3 [cs.LG] UPDATED)
    It is no secret amongst deep learning researchers that finding the optimal data augmentation strategy during training can mean the difference between state-of-the-art performance and a run-of-the-mill result. To that end, the community has seen many efforts to automate the process of finding the perfect augmentation procedure for any task at hand. Unfortunately, even recent cutting-edge methods bring massive computational overhead, requiring as many as 100 full model trainings to settle on an ideal configuration. We show how to achieve equivalent performance using just 6 trainings with Random Unidimensional Augmentation. Source code is available at https://github.com/fastestimator/RUA/tree/v1.0
    The Statistical Complexity of Interactive Decision Making. (arXiv:2112.13487v3 [cs.LG] UPDATED)
    A fundamental challenge in interactive learning and decision making, ranging from bandit problems to reinforcement learning, is to provide sample-efficient, adaptive learning algorithms that achieve near-optimal regret. This question is analogous to the classical problem of optimal (supervised) statistical learning, where there are well-known complexity measures (e.g., VC dimension and Rademacher complexity) that govern the statistical complexity of learning. However, characterizing the statistical complexity of interactive learning is substantially more challenging due to the adaptive nature of the problem. The main result of this work provides a complexity measure, the Decision-Estimation Coefficient, that is proven to be both necessary and sufficient for sample-efficient interactive learning. In particular, we provide: 1. a lower bound on the optimal regret for any interactive decision making problem, establishing the Decision-Estimation Coefficient as a fundamental limit. 2. a unified algorithm design principle, Estimation-to-Decisions (E2D), which transforms any algorithm for supervised estimation into an online algorithm for decision making. E2D attains a regret bound that matches our lower bound up to dependence on a notion of estimation performance, thereby achieving optimal sample-efficient learning as characterized by the Decision-Estimation Coefficient. Taken together, these results constitute a theory of learnability for interactive decision making. When applied to reinforcement learning settings, the Decision-Estimation Coefficient recovers essentially all existing hardness results and lower bounds. More broadly, the approach can be viewed as a decision-theoretic analogue of the classical Le Cam theory of statistical estimation; it also unifies a number of existing approaches -- both Bayesian and frequentist.
    The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks. (arXiv:2306.11680v2 [cs.LG] UPDATED)
    We study the implicit bias of batch normalization trained by gradient descent. We show that when learning a linear model with batch normalization for binary classification, gradient descent converges to a uniform margin classifier on the training data with an $\exp(-\Omega(\log^2 t))$ convergence rate. This distinguishes linear models with batch normalization from those without batch normalization in terms of both the type of implicit bias and the convergence rate. We further extend our result to a class of two-layer, single-filter linear convolutional neural networks, and show that batch normalization has an implicit bias towards a patch-wise uniform margin. Based on two examples, we demonstrate that patch-wise uniform margin classifiers can outperform the maximum margin classifiers in certain learning problems. Our results contribute to a better theoretical understanding of batch normalization.
    Domain-Agnostic Neural Architecture for Class Incremental Continual Learning in Document Processing Platform. (arXiv:2307.05399v1 [cs.LG])
    Production deployments in complex systems require ML architectures to be highly efficient and usable against multiple tasks. Particularly demanding are classification problems in which data arrives in a streaming fashion and each class is presented separately. Recent methods with stochastic gradient learning have been shown to struggle in such setups or have limitations like memory buffers, and being restricted to specific domains that disable its usage in real-world scenarios. For this reason, we present a fully differentiable architecture based on the Mixture of Experts model, that enables the training of high-performance classifiers when examples from each class are presented separately. We conducted exhaustive experiments that proved its applicability in various domains and ability to learn online in production environments. The proposed technique achieves SOTA results without a memory buffer and clearly outperforms the reference methods.
    Using BOLD-fMRI to Compute the Respiration Volume per Time (RTV) and Respiration Variation (RV) with Convolutional Neural Networks (CNN) in the Human Connectome Development Cohort. (arXiv:2307.05426v1 [eess.SP])
    In many fMRI studies, respiratory signals are unavailable or do not have acceptable quality. Consequently, the direct removal of low-frequency respiratory variations from BOLD signals is not possible. This study proposes a one-dimensional CNN model for reconstruction of two respiratory measures, RV and RVT. Results show that a CNN can capture informative features from resting BOLD signals and reconstruct realistic RV and RVT timeseries. It is expected that application of the proposed method will lower the cost of fMRI studies, reduce complexity, and decrease the burden on participants as they will not be required to wear a respiratory bellows.
    ISLTranslate: Dataset for Translating Indian Sign Language. (arXiv:2307.05440v1 [cs.CL])
    Sign languages are the primary means of communication for many hard-of-hearing people worldwide. Recently, to bridge the communication gap between the hard-of-hearing community and the rest of the population, several sign language translation datasets have been proposed to enable the development of statistical sign language translation systems. However, there is a dearth of sign language resources for the Indian sign language. This resource paper introduces ISLTranslate, a translation dataset for continuous Indian Sign Language (ISL) consisting of 31k ISL-English sentence/phrase pairs. To the best of our knowledge, it is the largest translation dataset for continuous Indian Sign Language. We provide a detailed analysis of the dataset. To validate the performance of existing end-to-end Sign language to spoken language translation systems, we benchmark the created dataset with a transformer-based model for ISL translation.
    Metropolis Sampling for Constrained Diffusion Models. (arXiv:2307.05439v1 [cs.LG])
    Denoising diffusion models have recently emerged as the predominant paradigm for generative modelling. Their extension to Riemannian manifolds has facilitated their application to an array of problems in the natural sciences. Yet, in many practical settings, such manifolds are defined by a set of constraints and are not covered by the existing (Riemannian) diffusion model methodology. Recent work has attempted to address this issue by employing novel noising processes based on logarithmic barrier methods or reflected Brownian motions. However, the associated samplers are computationally burdensome as the complexity of the constraints increases. In this paper, we introduce an alternative simple noising scheme based on Metropolis sampling that affords substantial gains in computational efficiency and empirical performance compared to the earlier samplers. Of independent interest, we prove that this new process corresponds to a valid discretisation of the reflected Brownian motion. We demonstrate the scalability and flexibility of our approach on a range of problem settings with convex and non-convex constraints, including applications from geospatial modelling, robotics and protein design.
    A Self-Supervised Algorithm for Denoising Photoplethysmography Signals for Heart Rate Estimation from Wearables. (arXiv:2307.05339v1 [eess.SP])
    Smart watches and other wearable devices are equipped with photoplethysmography (PPG) sensors for monitoring heart rate and other aspects of cardiovascular health. However, PPG signals collected from such devices are susceptible to corruption from noise and motion artifacts, which cause errors in heart rate estimation. Typical denoising approaches filter or reconstruct the signal in ways that eliminate much of the morphological information, even from the clean parts of the signal that would be useful to preserve. In this work, we develop an algorithm for denoising PPG signals that reconstructs the corrupted parts of the signal, while preserving the clean parts of the PPG signal. Our novel framework relies on self-supervised training, where we leverage a large database of clean PPG signals to train a denoising autoencoder. As we show, our reconstructed signals provide better estimates of heart rate from PPG signals than the leading heart rate estimation methods. Further experiments show significant improvement in Heart Rate Variability (HRV) estimation from PPG signals using our algorithm. We conclude that our algorithm denoises PPG signals in a way that can improve downstream analysis of many different health metrics from wearable devices.
    Action-based Early Autism Diagnosis Using Contrastive Feature Learning. (arXiv:2209.05379v3 [cs.CV] UPDATED)
    Autism, also known as Autism Spectrum Disorder (or ASD), is a neurological disorder. Its main symptoms include difficulty in (verbal and/or non-verbal) communication, and rigid/repetitive behavior. These symptoms are often indistinguishable from a normal (control) individual, due to which this disorder remains undiagnosed in early childhood leading to delayed treatment. Since the learning curve is steep during the initial age, an early diagnosis of autism could allow to take adequate interventions at the right time, which might positively affect the growth of an autistic child. Further, the traditional methods of autism diagnosis require multiple visits to a specialized psychiatrist, however this process can be time-consuming. In this paper, we present a learning based approach to automate autism diagnosis using simple and small action video clips of subjects. This task is particularly challenging because the amount of annotated data available is small, and the variations among samples from the two categories (ASD and control) are generally indistinguishable. This is also evident from poor performance of a binary classifier learned using the cross-entropy loss on top of a baseline encoder. To address this, we adopt contrastive feature learning in both self supervised and supervised learning frameworks, and show that these can lead to a significant increase in the prediction accuracy of a binary classifier on this task. We further validate this by conducting thorough experimental analyses under different set-ups on two publicly available datasets.
    Improving the Security of Smartwatch Payment with Deep Learning. (arXiv:2307.05437v1 [cs.CR])
    Making contactless payments using a smartwatch is increasingly popular, but this payment medium lacks traditional biometric security measures such as facial or fingerprint recognition. In 2022, Sturgess et al. proposed WatchAuth, a system for authenticating smartwatch payments using the physical gesture of reaching towards a payment terminal. While effective, the system requires the user to undergo a burdensome enrolment period to achieve acceptable error levels. In this dissertation, we explore whether applications of deep learning can reduce the number of gestures a user must provide to enrol into an authentication system for smartwatch payment. We firstly construct a deep-learned authentication system that outperforms the current state-of-the-art, including in a scenario where the target user has provided a limited number of gestures. We then develop a regularised autoencoder model for generating synthetic user-specific gestures. We show that using these gestures in training improves classification ability for an authentication system. Through this technique we can reduce the number of gestures required to enrol a user into a WatchAuth-like system without negatively impacting its error rates.
    A Physics-Informed Low-Shot Learning For sEMG-Based Estimation of Muscle Force and Joint Kinematics. (arXiv:2307.05361v1 [eess.SP])
    Muscle force and joint kinematics estimation from surface electromyography (sEMG) are essential for real-time biomechanical analysis of the dynamic interplay among neural muscle stimulation, muscle dynamics, and kinetics. Recent advances in deep neural networks (DNNs) have shown the potential to improve biomechanical analysis in a fully automated and reproducible manner. However, the small sample nature and physical interpretability of biomechanical analysis limit the applications of DNNs. This paper presents a novel physics-informed low-shot learning method for sEMG-based estimation of muscle force and joint kinematics. This method seamlessly integrates Lagrange's equation of motion and inverse dynamic muscle model into the generative adversarial network (GAN) framework for structured feature decoding and extrapolated estimation from the small sample data. Specifically, Lagrange's equation of motion is introduced into the generative model to restrain the structured decoding of the high-level features following the laws of physics. And a physics-informed policy gradient is designed to improve the adversarial learning efficiency by rewarding the consistent physical representation of the extrapolated estimations and the physical references. Experimental validations are conducted on two scenarios (i.e. the walking trials and wrist motion trials). Results indicate that the estimations of the muscle forces and joint kinematics are unbiased compared to the physics-based inverse dynamics, which outperforms the selected benchmark methods, including physics-informed convolution neural network (PI-CNN), vallina generative adversarial network (GAN), and multi-layer extreme learning machine (ML-ELM).
    U-CREAT: Unsupervised Case Retrieval using Events extrAcTion. (arXiv:2307.05260v1 [cs.IR])
    The task of Prior Case Retrieval (PCR) in the legal domain is about automatically citing relevant (based on facts and precedence) prior legal cases in a given query case. To further promote research in PCR, in this paper, we propose a new large benchmark (in English) for the PCR task: IL-PCR (Indian Legal Prior Case Retrieval) corpus. Given the complex nature of case relevance and the long size of legal documents, BM25 remains a strong baseline for ranking the cited prior documents. In this work, we explore the role of events in legal case retrieval and propose an unsupervised retrieval method-based pipeline U-CREAT (Unsupervised Case Retrieval using Events Extraction). We find that the proposed unsupervised retrieval method significantly increases performance compared to BM25 and makes retrieval faster by a considerable margin, making it applicable to real-time case retrieval systems. Our proposed system is generic, we show that it generalizes across two different legal systems (Indian and Canadian), and it shows state-of-the-art performance on the benchmarks for both the legal systems (IL-PCR and COLIEE corpora).
    Boosting Feedback Efficiency of Interactive Reinforcement Learning by Adaptive Learning from Scores. (arXiv:2307.05405v1 [cs.RO])
    Interactive reinforcement learning has shown promise in learning complex robotic tasks. However, the process can be human-intensive due to the requirement of large amount of interactive feedback. This paper presents a new method that uses scores provided by humans, instead of pairwise preferences, to improve the feedback efficiency of interactive reinforcement learning. Our key insight is that scores can yield significantly more data than pairwise preferences. Specifically, we require a teacher to interactively score the full trajectories of an agent to train a behavioral policy in a sparse reward environment. To avoid unstable scores given by human negatively impact the training process, we propose an adaptive learning scheme. This enables the learning paradigm to be insensitive to imperfect or unreliable scores. We extensively evaluate our method on robotic locomotion and manipulation tasks. The results show that the proposed method can efficiently learn near-optimal policies by adaptive learning from scores, while requiring less feedback compared to pairwise preference learning methods. The source codes are publicly available at https://github.com/SSKKai/Interactive-Scoring-IRL.
    A Survey From Distributed Machine Learning to Distributed Deep Learning. (arXiv:2307.05232v1 [cs.LG])
    Artificial intelligence has achieved significant success in handling complex tasks in recent years. This success is due to advances in machine learning algorithms and hardware acceleration. In order to obtain more accurate results and solve more complex problems, algorithms must be trained with more data. This huge amount of data could be time-consuming to process and require a great deal of computation. This solution could be achieved by distributing the data and algorithm across several machines, which is known as distributed machine learning. There has been considerable effort put into distributed machine learning algorithms, and different methods have been proposed so far. In this article, we present a comprehensive summary of the current state-of-the-art in the field through the review of these algorithms. We divide this algorithms in classification and clustering (traditional machine learning), deep learning and deep reinforcement learning groups. Distributed deep learning has gained more attention in recent years and most of studies worked on this algorithms. As a result, most of the articles we discussed here belong to this category. Based on our investigation of algorithms, we highlight limitations that should be addressed in future research.
    Unleashing the Potential of Regularization Strategies in Learning with Noisy Labels. (arXiv:2307.05025v1 [cs.LG])
    In recent years, research on learning with noisy labels has focused on devising novel algorithms that can achieve robustness to noisy training labels while generalizing to clean data. These algorithms often incorporate sophisticated techniques, such as noise modeling, label correction, and co-training. In this study, we demonstrate that a simple baseline using cross-entropy loss, combined with widely used regularization strategies like learning rate decay, model weights average, and data augmentations, can outperform state-of-the-art methods. Our findings suggest that employing a combination of regularization strategies can be more effective than intricate algorithms in tackling the challenges of learning with noisy labels. While some of these regularization strategies have been utilized in previous noisy label learning research, their full potential has not been thoroughly explored. Our results encourage a reevaluation of benchmarks for learning with noisy labels and prompt reconsideration of the role of specialized learning algorithms designed for training with noisy labels.
    Attribute Controlled Dialogue Prompting. (arXiv:2307.05228v1 [cs.CL])
    Prompt-tuning has become an increasingly popular parameter-efficient method for adapting large pretrained language models to downstream tasks. However, both discrete prompting and continuous prompting assume fixed prompts for all data samples within a task, neglecting the fact that inputs vary greatly in some tasks such as open-domain dialogue generation. In this paper, we present a novel, instance-specific prompt-tuning algorithm for dialogue generation. Specifically, we generate prompts based on instance-level control code, rather than the conversation history, to explore their impact on controlled dialogue generation. Experiments on popular open-domain dialogue datasets, evaluated on both automated metrics and human evaluation, demonstrate that our method is superior to prompting baselines and comparable to fine-tuning with only 5%-6% of total parameters.
    Self-Supervised Learning with Lie Symmetries for Partial Differential Equations. (arXiv:2307.05432v1 [cs.LG])
    Machine learning for differential equations paves the way for computationally efficient alternatives to numerical solvers, with potentially broad impacts in science and engineering. Though current algorithms typically require simulated training data tailored to a given setting, one may instead wish to learn useful information from heterogeneous sources, or from real dynamical systems observations that are messy or incomplete. In this work, we learn general-purpose representations of PDEs from heterogeneous data by implementing joint embedding methods for self-supervised learning (SSL), a framework for unsupervised representation learning that has had notable success in computer vision. Our representation outperforms baseline approaches to invariant tasks, such as regressing the coefficients of a PDE, while also improving the time-stepping performance of neural solvers. We hope that our proposed methodology will prove useful in the eventual development of general-purpose foundation models for PDEs.
    The Value of Chess Squares. (arXiv:2307.05330v1 [cs.AI])
    Valuing chess squares and determining the placement of pieces on the board are the main objectives of our study. With the emergence of chess AI, it has become possible to accurately assess the worth of positions in a game of chess. The conventional approach assigns fixed values to pieces $(\symking=\infty, \symqueen=9, \symrook=5, \symbishop=3, \symknight=3, \sympawn=1)$. We enhance this analysis by introducing marginal valuations for both pieces and squares. We demonstrate our method by examining the positioning of Knights and Bishops, and also provide valuable insights into the valuation of pawns. Notably, Nimzowitsch was among the pioneers in advocating for the significance of Pawn structure and valuation. Finally, we conclude by suggesting potential avenues for future research.
    Membership Inference Attacks on DNNs using Adversarial Perturbations. (arXiv:2307.05193v1 [cs.LG])
    Several membership inference (MI) attacks have been proposed to audit a target DNN. Given a set of subjects, MI attacks tell which subjects the target DNN has seen during training. This work focuses on the post-training MI attacks emphasizing high confidence membership detection -- True Positive Rates (TPR) at low False Positive Rates (FPR). Current works in this category -- likelihood ratio attack (LiRA) and enhanced MI attack (EMIA) -- only perform well on complex datasets (e.g., CIFAR-10 and Imagenet) where the target DNN overfits its train set, but perform poorly on simpler datasets (0% TPR by both attacks on Fashion-MNIST, 2% and 0% TPR respectively by LiRA and EMIA on MNIST at 1% FPR). To address this, firstly, we unify current MI attacks by presenting a framework divided into three stages -- preparation, indication and decision. Secondly, we utilize the framework to propose two novel attacks: (1) Adversarial Membership Inference Attack (AMIA) efficiently utilizes the membership and the non-membership information of the subjects while adversarially minimizing a novel loss function, achieving 6% TPR on both Fashion-MNIST and MNIST datasets; and (2) Enhanced AMIA (E-AMIA) combines EMIA and AMIA to achieve 8% and 4% TPRs on Fashion-MNIST and MNIST datasets respectively, at 1% FPR. Thirdly, we introduce two novel augmented indicators that positively leverage the loss information in the Gaussian neighborhood of a subject. This improves TPR of all four attacks on average by 2.5% and 0.25% respectively on Fashion-MNIST and MNIST datasets at 1% FPR. Finally, we propose simple, yet novel, evaluation metric, the running TPR average (RTA) at a given FPR, that better distinguishes different MI attacks in the low FPR region. We also show that AMIA and E-AMIA are more transferable to the unknown DNNs (other than the target DNN) and are more robust to DP-SGD training as compared to LiRA and EMIA.
    Unbiased Pain Assessment through Wearables and EHR Data: Multi-attribute Fairness Loss-based CNN Approach. (arXiv:2307.05333v1 [eess.SP])
    The combination of diverse health data (IoT, EHR, and clinical surveys) and scalable-adaptable Artificial Intelligence (AI), has enabled the discovery of physical, behavioral, and psycho-social indicators of pain status. Despite the hype and promise to fundamentally alter the healthcare system with technological advancements, much AI adoption in clinical pain evaluation has been hampered by the heterogeneity of the problem itself and other challenges, such as personalization and fairness. Studies have revealed that many AI (i.e., machine learning or deep learning) models display biases and discriminate against specific population segments (such as those based on gender or ethnicity), which breeds skepticism among medical professionals about AI adaptability. In this paper, we propose a Multi-attribute Fairness Loss (MAFL) based CNN model that aims to account for any sensitive attributes included in the data and fairly predict patients' pain status while attempting to minimize the discrepancies between privileged and unprivileged groups. In order to determine whether the trade-off between accuracy and fairness can be satisfied, we compare the proposed model with well-known existing mitigation procedures, and studies reveal that the implemented model performs favorably in contrast to state-of-the-art methods. Utilizing NIH All-Of-US data, where a cohort of 868 distinct individuals with wearables and EHR data gathered over 1500 days has been taken into consideration to analyze our suggested fair pain assessment system.
    A Deep Dive into Perturbations as Evaluation Technique for Time Series XAI. (arXiv:2307.05104v1 [cs.LG])
    Explainable Artificial Intelligence (XAI) has gained significant attention recently as the demand for transparency and interpretability of machine learning models has increased. In particular, XAI for time series data has become increasingly important in finance, healthcare, and climate science. However, evaluating the quality of explanations, such as attributions provided by XAI techniques, remains challenging. This paper provides an in-depth analysis of using perturbations to evaluate attributions extracted from time series models. A perturbation analysis involves systematically modifying the input data and evaluating the impact on the attributions generated by the XAI method. We apply this approach to several state-of-the-art XAI techniques and evaluate their performance on three time series classification datasets. Our results demonstrate that the perturbation analysis approach can effectively evaluate the quality of attributions and provide insights into the strengths and limitations of XAI techniques. Such an approach can guide the selection of XAI methods for time series data, e.g., focusing on return time rather than precision, and facilitate the development of more reliable and interpretable machine learning models for time series analysis.
    Application of data engineering approaches to address challenges in microbiome data for optimal medical decision-making. (arXiv:2307.00033v2 [q-bio.QM] UPDATED)
    The human gut microbiota is known to contribute to numerous physiological functions of the body and also implicated in a myriad of pathological conditions. Prolific research work in the past few decades have yielded valuable information regarding the relative taxonomic distribution of gut microbiota. Unfortunately, the microbiome data suffers from class imbalance and high dimensionality issues that must be addressed. In this study, we have implemented data engineering algorithms to address the above-mentioned issues inherent to microbiome data. Four standard machine learning classifiers (logistic regression (LR), support vector machines (SVM), random forests (RF), and extreme gradient boosting (XGB) decision trees) were implemented on a previously published dataset. The issue of class imbalance and high dimensionality of the data was addressed through synthetic minority oversampling technique (SMOTE) and principal component analysis (PCA). Our results indicate that ensemble classifiers (RF and XGB decision trees) exhibit superior classification accuracy in predicting the host phenotype. The application of PCA significantly reduced testing time while maintaining high classification accuracy. The highest classification accuracy was obtained at the levels of species for most classifiers. The prototype employed in the study addresses the issues inherent to microbiome datasets and could be highly beneficial for providing personalized medicine.
    Transaction Fraud Detection via Spatial-Temporal-Aware Graph Transformer. (arXiv:2307.05121v1 [cs.LG])
    How to obtain informative representations of transactions and then perform the identification of fraudulent transactions is a crucial part of ensuring financial security. Recent studies apply Graph Neural Networks (GNNs) to the transaction fraud detection problem. Nevertheless, they encounter challenges in effectively learning spatial-temporal information due to structural limitations. Moreover, few prior GNN-based detectors have recognized the significance of incorporating global information, which encompasses similar behavioral patterns and offers valuable insights for discriminative representation learning. Therefore, we propose a novel heterogeneous graph neural network called Spatial-Temporal-Aware Graph Transformer (STA-GT) for transaction fraud detection problems. Specifically, we design a temporal encoding strategy to capture temporal dependencies and incorporate it into the graph neural network framework, enhancing spatial-temporal information modeling and improving expressive ability. Furthermore, we introduce a transformer module to learn local and global information. Pairwise node-node interactions overcome the limitation of the GNN structure and build up the interactions with the target node and long-distance ones. Experimental results on two financial datasets compared to general GNN models and GNN-based fraud detectors demonstrate that our proposed method STA-GT is effective on the transaction fraud detection task.
    Classification of sleep stages from EEG, EOG and EMG signals by SSNet. (arXiv:2307.05373v1 [eess.SP])
    Classification of sleep stages plays an essential role in diagnosing sleep-related diseases including Sleep Disorder Breathing (SDB) disease. In this study, we propose an end-to-end deep learning architecture, named SSNet, which comprises of two deep learning networks based on Convolutional Neuron Networks (CNN) and Long Short Term Memory (LSTM). Both deep learning networks extract features from the combination of Electrooculogram (EOG), Electroencephalogram (EEG), and Electromyogram (EMG) signals, as each signal has distinct features that help in the classification of sleep stages. The features produced by the two-deep learning networks are concatenated to pass to the fully connected layer for the classification. The performance of our proposed model is evaluated by using two public datasets Sleep-EDF Expanded dataset and ISRUC-Sleep dataset. The accuracy and Kappa coefficient are 96.36% and 93.40% respectively, for classifying three classes of sleep stages using Sleep-EDF Expanded dataset. Whereas, the accuracy and Kappa coefficient are 96.57% and 83.05% respectively for five classes of sleep stages using Sleep-EDF Expanded dataset. Our model achieves the best performance in classifying sleep stages when compared with the state-of-the-art techniques.
    Multi-Task Learning to Enhance Generazability of Neural Network Equalizers in Coherent Optical Systems. (arXiv:2307.05374v1 [eess.SP])
    For the first time, multi-task learning is proposed to improve the flexibility of NN-based equalizers in coherent systems. A "single" NN-based equalizer improves Q-factor by up to 4 dB compared to CDC, without re-training, even with variations in launch power, symbol rate, or transmission distance.
    Fast Neural Network Inference on FPGAs for Triggering on Long-Lived Particles at Colliders. (arXiv:2307.05152v1 [hep-ex])
    Experimental particle physics demands a sophisticated trigger and acquisition system capable to efficiently retain the collisions of interest for further investigation. Heterogeneous computing with the employment of FPGA cards may emerge as a trending technology for the triggering strategy of the upcoming high-luminosity program of the Large Hadron Collider at CERN. In this context, we present two machine-learning algorithms for selecting events where neutral long-lived particles decay within the detector volume studying their accuracy and inference time when accelerated on commercially available Xilinx FPGA accelerator cards. The inference time is also confronted with a CPU- and GPU-based hardware setup. The proposed new algorithms are proven efficient for the considered benchmark physics scenario and their accuracy is found to not degrade when accelerated on the FPGA cards. The results indicate that all tested architectures fit within the latency requirements of a second-level trigger farm and that exploiting accelerator technologies for real-time processing of particle-physics collisions is a promising research field that deserves additional investigations, in particular with machine-learning models with a large number of trainable parameters.
    Decorrelation using Optimal Transport. (arXiv:2307.05187v1 [hep-ph])
    Being able to decorrelate a feature space from protected attributes is an area of active research and study in ethics, fairness, and also natural sciences. We introduce a novel decorrelation method using Convex Neural Optimal Transport Solvers (Cnots), that is able to decorrelate continuous feature space against protected attributes with optimal transport. We demonstrate how well it performs in the context of jet classification in high energy physics, where classifier scores are desired to be decorrelated from the mass of a jet. The decorrelation achieved in binary classification approaches the levels achieved by the state-of-the-art using conditional normalising flows. When moving to multiclass outputs the optimal transport approach performs significantly better than the state-of-the-art, suggesting substantial gains at decorrelating multidimensional feature spaces.
    $\ell_p$-Regression in the Arbitrary Partition Model of Communication. (arXiv:2307.05117v1 [cs.DS])
    We consider the randomized communication complexity of the distributed $\ell_p$-regression problem in the coordinator model, for $p\in (0,2]$. In this problem, there is a coordinator and $s$ servers. The $i$-th server receives $A^i\in\{-M, -M+1, \ldots, M\}^{n\times d}$ and $b^i\in\{-M, -M+1, \ldots, M\}^n$ and the coordinator would like to find a $(1+\epsilon)$-approximate solution to $\min_{x\in\mathbb{R}^n} \|(\sum_i A^i)x - (\sum_i b^i)\|_p$. Here $M \leq \mathrm{poly}(nd)$ for convenience. This model, where the data is additively shared across servers, is commonly referred to as the arbitrary partition model. We obtain significantly improved bounds for this problem. For $p = 2$, i.e., least squares regression, we give the first optimal bound of $\tilde{\Theta}(sd^2 + sd/\epsilon)$ bits. For $p \in (1,2)$,we obtain an $\tilde{O}(sd^2/\epsilon + sd/\mathrm{poly}(\epsilon))$ upper bound. Notably, for $d$ sufficiently large, our leading order term only depends linearly on $1/\epsilon$ rather than quadratically. We also show communication lower bounds of $\Omega(sd^2 + sd/\epsilon^2)$ for $p\in (0,1]$ and $\Omega(sd^2 + sd/\epsilon)$ for $p\in (1,2]$. Our bounds considerably improve previous bounds due to (Woodruff et al. COLT, 2013) and (Vempala et al., SODA, 2020).
    Contextual Pre-Planning on Reward Machine Abstractions for Enhanced Transfer in Deep Reinforcement Learning. (arXiv:2307.05209v1 [cs.AI])
    Recent studies show that deep reinforcement learning (DRL) agents tend to overfit to the task on which they were trained and fail to adapt to minor environment changes. To expedite learning when transferring to unseen tasks, we propose a novel approach to representing the current task using reward machines (RM), state machine abstractions that induce subtasks based on the current task's rewards and dynamics. Our method provides agents with symbolic representations of optimal transitions from their current abstract state and rewards them for achieving these transitions. These representations are shared across tasks, allowing agents to exploit knowledge of previously encountered symbols and transitions, thus enhancing transfer. Our empirical evaluation shows that our representations improve sample efficiency and few-shot transfer in a variety of domains.
    SimpleMTOD: A Simple Language Model for Multimodal Task-Oriented Dialogue with Symbolic Scene Representation. (arXiv:2307.04907v1 [cs.CL])
    SimpleMTOD is a simple language model which recasts several sub-tasks in multimodal task-oriented dialogues as sequence prediction tasks. SimpleMTOD is built on a large-scale transformer-based auto-regressive architecture, which has already proven to be successful in uni-modal task-oriented dialogues, and effectively leverages transfer learning from pre-trained GPT-2. In-order to capture the semantics of visual scenes, we introduce both local and de-localized tokens for objects within a scene. De-localized tokens represent the type of an object rather than the specific object itself and so possess a consistent meaning across the dataset. SimpleMTOD achieves a state-of-the-art BLEU score (0.327) in the Response Generation sub-task of the SIMMC 2.0 test-std dataset while performing on par in other multimodal sub-tasks: Disambiguation, Coreference Resolution, and Dialog State Tracking. This is despite taking a minimalist approach for extracting visual (and non-visual) information. In addition the model does not rely on task-specific architectural changes such as classification heads.
    Control as Probabilistic Inference as an Emergent Communication Mechanism in Multi-Agent Reinforcement Learning. (arXiv:2307.05004v1 [cs.AI])
    This paper proposes a generative probabilistic model integrating emergent communication and multi-agent reinforcement learning. The agents plan their actions by probabilistic inference, called control as inference, and communicate using messages that are latent variables and estimated based on the planned actions. Through these messages, each agent can send information about its actions and know information about the actions of another agent. Therefore, the agents change their actions according to the estimated messages to achieve cooperative tasks. This inference of messages can be considered as communication, and this procedure can be formulated by the Metropolis-Hasting naming game. Through experiments in the grid world environment, we show that the proposed PGM can infer meaningful messages to achieve the cooperative task.
    On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis. (arXiv:2307.05132v1 [eess.AS])
    Self-supervised learning (SSL) speech representations learned from large amounts of diverse, mixed-quality speech data without transcriptions are gaining ground in many speech technology applications. Prior work has shown that SSL is an effective intermediate representation in two-stage text-to-speech (TTS) for both read and spontaneous speech. However, it is still not clear which SSL and which layer from each SSL model is most suited for spontaneous TTS. We address this shortcoming by extending the scope of comparison for SSL in spontaneous TTS to 6 different SSLs and 3 layers within each SSL. Furthermore, SSL has also shown potential in predicting the mean opinion scores (MOS) of synthesized speech, but this has only been done in read-speech MOS prediction. We extend an SSL-based MOS prediction framework previously developed for scoring read speech synthesis and evaluate its performance on synthesized spontaneous speech. All experiments are conducted twice on two different spontaneous corpora in order to find generalizable trends. Overall, we present comprehensive experimental results on the use of SSL in spontaneous TTS and MOS prediction to further quantify and understand how SSL can be used in spontaneous TTS. Audios samples: https://www.speech.kth.se/tts-demos/sp_ssl_tts
    Using Linear Regression for Iteratively Training Neural Networks. (arXiv:2307.05189v1 [cs.LG])
    We present a simple linear regression based approach for learning the weights and biases of a neural network, as an alternative to standard gradient based backpropagation. The present work is exploratory in nature, and we restrict the description and experiments to (i) simple feedforward neural networks, (ii) scalar (single output) regression problems, and (iii) invertible activation functions. However, the approach is intended to be extensible to larger, more complex architectures. The key idea is the observation that the input to every neuron in a neural network is a linear combination of the activations of neurons in the previous layer, as well as the parameters (weights and biases) of the layer. If we are able to compute the ideal total input values to every neuron by working backwards from the output, we can formulate the learning problem as a linear least squares problem which iterates between updating the parameters and the activation values. We present an explicit algorithm that implements this idea, and we show that (at least for simple problems) the approach is more stable and faster than gradient-based backpropagation.
    Feature Activation Map: Visual Explanation of Deep Learning Models for Image Classification. (arXiv:2307.05017v1 [cs.CV])
    Decisions made by convolutional neural networks(CNN) can be understood and explained by visualizing discriminative regions on images. To this end, Class Activation Map (CAM) based methods were proposed as powerful interpretation tools, making the prediction of deep learning models more explainable, transparent, and trustworthy. However, all the CAM-based methods (e.g., CAM, Grad-CAM, and Relevance-CAM) can only be used for interpreting CNN models with fully-connected (FC) layers as a classifier. It is worth noting that many deep learning models classify images without FC layers, e.g., few-shot learning image classification, contrastive learning image classification, and image retrieval tasks. In this work, a post-hoc interpretation tool named feature activation map (FAM) is proposed, which can interpret deep learning models without FC layers as a classifier. In the proposed FAM algorithm, the channel-wise contribution weights are derived from the similarity scores between two image embeddings. The activation maps are linearly combined with the corresponding normalized contribution weights, forming the explanation map for visualization. The quantitative and qualitative experiments conducted on ten deep learning models for few-shot image classification, contrastive learning image classification and image retrieval tasks demonstrate the effectiveness of the proposed FAM algorithm.
    On the Effectiveness of Speech Self-supervised Learning for Music. (arXiv:2307.05161v1 [cs.SD])
    Self-supervised learning (SSL) has shown promising results in various speech and natural language processing applications. However, its efficacy in music information retrieval (MIR) still remains largely unexplored. While previous SSL models pre-trained on music recordings may have been mostly closed-sourced, recent speech models such as wav2vec2.0 have shown promise in music modelling. Nevertheless, research exploring the effectiveness of applying speech SSL models to music recordings has been limited. We explore the music adaption of SSL with two distinctive speech-related models, data2vec1.0 and Hubert, and refer to them as music2vec and musicHuBERT, respectively. We train $12$ SSL models with 95M parameters under various pre-training configurations and systematically evaluate the MIR task performances with 13 different MIR tasks. Our findings suggest that training with music data can generally improve performance on MIR tasks, even when models are trained using paradigms designed for speech. However, we identify the limitations of such existing speech-oriented designs, especially in modelling polyphonic information. Based on the experimental results, empirical suggestions are also given for designing future musical SSL strategies and paradigms.
    CREPE: Learnable Prompting With CLIP Improves Visual Relationship Prediction. (arXiv:2307.04838v1 [cs.CV])
    In this paper, we explore the potential of Vision-Language Models (VLMs), specifically CLIP, in predicting visual object relationships, which involves interpreting visual features from images into language-based relations. Current state-of-the-art methods use complex graphical models that utilize language cues and visual features to address this challenge. We hypothesize that the strong language priors in CLIP embeddings can simplify these graphical models paving for a simpler approach. We adopt the UVTransE relation prediction framework, which learns the relation as a translational embedding with subject, object, and union box embeddings from a scene. We systematically explore the design of CLIP-based subject, object, and union-box representations within the UVTransE framework and propose CREPE (CLIP Representation Enhanced Predicate Estimation). CREPE utilizes text-based representations for all three bounding boxes and introduces a novel contrastive training strategy to automatically infer the text prompt for union-box. Our approach achieves state-of-the-art performance in predicate estimation, mR@5 27.79, and mR@20 31.95 on the Visual Genome benchmark, achieving a 15.3\% gain in performance over recent state-of-the-art at mR@20. This work demonstrates CLIP's effectiveness in object relation prediction and encourages further research on VLMs in this challenging domain.
    A Theory of Bounded Inductive Rationality. (arXiv:2307.05068v1 [cs.AI])
    The dominant theories of rational choice assume logical omniscience. That is, they assume that when facing a decision problem, an agent can perform all relevant computations and determine the truth value of all relevant logical/mathematical claims. This assumption is unrealistic when, for example, we offer bets on remote digits of pi or when an agent faces a computationally intractable planning problem. Furthermore, the assumption of logical omniscience creates contradictions in cases where the environment can contain descriptions of the agent itself. Importantly, strategic interactions as studied in game theory are decision problems in which a rational agent is predicted by its environment (the other players). In this paper, we develop a theory of rational decision making that does not assume logical omniscience. We consider agents who repeatedly face decision problems (including ones like betting on digits of pi or games against other agents). The main contribution of this paper is to provide a sensible theory of rationality for such agents. Roughly, we require that a boundedly rational inductive agent tests each efficiently computable hypothesis infinitely often and follows those hypotheses that keep their promises of high rewards. We then prove that agents that are rational in this sense have other desirable properties. For example, they learn to value random and pseudo-random lotteries at their expected reward. Finally, we consider strategic interactions between different agents and prove a folk theorem for what strategies bounded rational inductive agents can converge to.
    Enhancing Continuous Time Series Modelling with a Latent ODE-LSTM Approach. (arXiv:2307.05126v1 [cs.LG])
    Due to their dynamic properties such as irregular sampling rate and high-frequency sampling, Continuous Time Series (CTS) are found in many applications. Since CTS with irregular sampling rate are difficult to model with standard Recurrent Neural Networks (RNNs), RNNs have been generalised to have continuous-time hidden dynamics defined by a Neural Ordinary Differential Equation (Neural ODE), leading to the ODE-RNN model. Another approach that provides a better modelling is that of the Latent ODE model, which constructs a continuous-time model where a latent state is defined at all times. The Latent ODE model uses a standard RNN as the encoder and a Neural ODE as the decoder. However, since the RNN encoder leads to difficulties with missing data and ill-defined latent variables, a Latent ODE-RNN model has recently been proposed that uses a ODE-RNN model as the encoder instead. Both the Latent ODE and Latent ODE-RNN models are difficult to train due to the vanishing and exploding gradients problem. To overcome this problem, the main contribution of this paper is to propose and illustrate a new model based on a new Latent ODE using an ODE-LSTM (Long Short-Term Memory) network as an encoder -- the Latent ODE-LSTM model. To limit the growth of the gradients the Norm Gradient Clipping strategy was embedded on the Latent ODE-LSTM model. The performance evaluation of the new Latent ODE-LSTM (with and without Norm Gradient Clipping) for modelling CTS with regular and irregular sampling rates is then demonstrated. Numerical experiments show that the new Latent ODE-LSTM performs better than Latent ODE-RNNs and can avoid the vanishing and exploding gradients during training.
    TIAM -- A Metric for Evaluating Alignment in Text-to-Image Generation. (arXiv:2307.05134v1 [cs.CV])
    The progress in the generation of synthetic images has made it crucial to assess their quality. While several metrics have been proposed to assess the rendering of images, it is crucial for Text-to-Image (T2I) models, which generate images based on a prompt, to consider additional aspects such as to which extent the generated image matches the important content of the prompt. Moreover, although the generated images usually result from a random starting point, the influence of this one is generally not considered. In this article, we propose a new metric based on prompt templates to study the alignment between the content specified in the prompt and the corresponding generated images. It allows us to better characterize the alignment in terms of the type of the specified objects, their number, and their color. We conducted a study on several recent T2I models about various aspects. An additional interesting result we obtained with our approach is that image quality can vary drastically depending on the latent noise used as a seed for the images. We also quantify the influence of the number of concepts in the prompt, their order as well as their (color) attributes. Finally, our method allows us to identify some latent seeds that produce better images than others, opening novel directions of research on this understudied topic.
    Diagnosing Model Performance Under Distribution Shift. (arXiv:2303.02011v4 [stat.ML] UPDATED)
    Prediction models can perform poorly when deployed to target distributions different from the training distribution. To understand these operational failure modes, we develop a method, called DIstribution Shift DEcomposition (DISDE), to attribute a drop in performance to different types of distribution shifts. Our approach decomposes the performance drop into terms for 1) an increase in harder but frequently seen examples from training, 2) changes in the relationship between features and outcomes, and 3) poor performance on examples infrequent or unseen during training. These terms are defined by fixing a distribution on $X$ while varying the conditional distribution of $Y \mid X$ between training and target, or by fixing the conditional distribution of $Y \mid X$ while varying the distribution on $X$. In order to do this, we define a hypothetical distribution on $X$ consisting of values common in both training and target, over which it is easy to compare $Y \mid X$ and thus predictive performance. We estimate performance on this hypothetical distribution via reweighting methods. Empirically, we show how our method can 1) inform potential modeling improvements across distribution shifts for employment prediction on tabular census data, and 2) help to explain why certain domain adaptation methods fail to improve model performance for satellite image classification.
    Benchmarking Algorithms for Federated Domain Generalization. (arXiv:2307.04942v1 [cs.LG])
    While prior domain generalization (DG) benchmarks consider train-test dataset heterogeneity, we evaluate Federated DG which introduces federated learning (FL) specific challenges. Additionally, we explore domain-based heterogeneity in clients' local datasets - a realistic Federated DG scenario. Prior Federated DG evaluations are limited in terms of the number or heterogeneity of clients and dataset diversity. To address this gap, we propose an Federated DG benchmark methodology that enables control of the number and heterogeneity of clients and provides metrics for dataset difficulty. We then apply our methodology to evaluate 13 Federated DG methods, which include centralized DG methods adapted to the FL context, FL methods that handle client heterogeneity, and methods designed specifically for Federated DG. Our results suggest that despite some progress, there remain significant performance gaps in Federated DG particularly when evaluating with a large number of clients, high client heterogeneity, or more realistic datasets. Please check our extendable benchmark code here: https://github.com/inouye-lab/FedDG_Benchmark.
    Benchmarking Bayesian Causal Discovery Methods for Downstream Treatment Effect Estimation. (arXiv:2307.04988v1 [cs.LG])
    The practical utility of causality in decision-making is widely recognized, with causal discovery and inference being inherently intertwined. Nevertheless, a notable gap exists in the evaluation of causal discovery methods, where insufficient emphasis is placed on downstream inference. To address this gap, we evaluate six established baseline causal discovery methods and a newly proposed method based on GFlowNets, on the downstream task of treatment effect estimation. Through the implementation of a robust evaluation procedure, we offer valuable insights into the efficacy of these causal discovery methods for treatment effect estimation, considering both synthetic and real-world scenarios, as well as low-data scenarios. Furthermore, the results of our study demonstrate that GFlowNets possess the capability to effectively capture a wide range of useful and diverse ATE modes.
    Onion Universe Algorithm: Applications in Weakly Supervised Learning. (arXiv:2307.04870v1 [cs.LG])
    We introduce Onion Universe Algorithm (OUA), a novel classification method in ensemble learning. In particular, we show its applicability as a label model for weakly supervised learning. OUA offers simplicity in implementation, computational efficiency, and does not rely on any assumptions regarding the data or weak signals. The model is well suited for scenarios where fully labeled data is not available. Our method is built upon geometrical interpretation of the space spanned by weak signals. Empirical results support our analysis of the hidden geometric structure underlying general set of weak signals and also illustrates that OUA works well in practice. We show empirical evidence that OUA performs favorably on common benchmark datasets compared to existing label models for weakly supervised learning.
    Reinforcement Learning with Non-Cumulative Objective. (arXiv:2307.04957v1 [cs.LG])
    In reinforcement learning, the objective is almost always defined as a \emph{cumulative} function over the rewards along the process. However, there are many optimal control and reinforcement learning problems in various application fields, especially in communications and networking, where the objectives are not naturally expressed as summations of the rewards. In this paper, we recognize the prevalence of non-cumulative objectives in various problems, and propose a modification to existing algorithms for optimizing such objectives. Specifically, we dive into the fundamental building block for many optimal control and reinforcement learning algorithms: the Bellman optimality equation. To optimize a non-cumulative objective, we replace the original summation operation in the Bellman update rule with a generalized operation corresponding to the objective. Furthermore, we provide sufficient conditions on the form of the generalized operation as well as assumptions on the Markov decision process under which the globally optimal convergence of the generalized Bellman updates can be guaranteed. We demonstrate the idea experimentally with the bottleneck objective, i.e., the objectives determined by the minimum reward along the process, on classical optimal control and reinforcement learning tasks, as well as on two network routing problems on maximizing the flow rates.
    Improving Fairness of Graph Neural Networks: A Graph Counterfactual Perspective. (arXiv:2307.04937v1 [cs.LG])
    Graph neural networks have shown great ability in representation (GNNs) learning on graphs, facilitating various tasks. Despite their great performance in modeling graphs, recent works show that GNNs tend to inherit and amplify the bias from training data, causing concerns of the adoption of GNNs in high-stake scenarios. Hence, many efforts have been taken for fairness-aware GNNs. However, most existing fair GNNs learn fair node representations by adopting statistical fairness notions, which may fail to alleviate bias in the presence of statistical anomalies. Motivated by causal theory, there are several attempts utilizing graph counterfactual fairness to mitigate root causes of unfairness. However, these methods suffer from non-realistic counterfactuals obtained by perturbation or generation. In this paper, we take a causal view on fair graph learning problem. Guided by the casual analysis, we propose a novel framework CAF, which can select counterfactuals from training data to avoid non-realistic counterfactuals and adopt selected counterfactuals to learn fair node representations for node classification task. Extensive experiments on synthetic and real-world datasets show the effectiveness of CAF.
    DDGM: Solving inverse problems by Diffusive Denoising of Gradient-based Minimization. (arXiv:2307.04946v1 [cs.CV])
    Inverse problems generally require a regularizer or prior for a good solution. A recent trend is to train a convolutional net to denoise images, and use this net as a prior when solving the inverse problem. Several proposals depend on a singular value decomposition of the forward operator, and several others backpropagate through the denoising net at runtime. Here we propose a simpler approach that combines the traditional gradient-based minimization of reconstruction error with denoising. Noise is also added at each step, so the iterative dynamics resembles a Langevin or diffusion process. Both the level of added noise and the size of the denoising step decay exponentially with time. We apply our method to the problem of tomographic reconstruction from electron micrographs acquired at multiple tilt angles. With empirical studies using simulated tilt views, we find parameter settings for our method that produce good results. We show that high accuracy can be achieved with as few as 50 denoising steps. We also compare with DDRM and DPS, more complex diffusion methods of the kinds mentioned above. These methods are less accurate (as measured by MSE and SSIM) for our tomography problem, even after the generation hyperparameters are optimized. Finally we extend our method to reconstruction of arbitrary-sized images and show results on 128 $\times$ 1568 pixel images
    SigOpt Mulch: An Intelligent System for AutoML of Gradient Boosted Trees. (arXiv:2307.04849v1 [cs.LG])
    Gradient boosted trees (GBTs) are ubiquitous models used by researchers, machine learning (ML) practitioners, and data scientists because of their robust performance, interpretable behavior, and ease-of-use. One critical challenge in training GBTs is the tuning of their hyperparameters. In practice, selecting these hyperparameters is often done manually. Recently, the ML community has advocated for tuning hyperparameters through black-box optimization and developed state-of-the-art systems to do so. However, applying such systems to tune GBTs suffers from two drawbacks. First, these systems are not \textit{model-aware}, rather they are designed to apply to a \textit{generic} model; this leaves significant optimization performance on the table. Second, using these systems requires \textit{domain knowledge} such as the choice of hyperparameter search space, which is an antithesis to the automatic experimentation that black-box optimization aims to provide. In this paper, we present SigOpt Mulch, a model-aware hyperparameter tuning system specifically designed for automated tuning of GBTs that provides two improvements over existing systems. First, Mulch leverages powerful techniques in metalearning and multifidelity optimization to perform model-aware hyperparameter optimization. Second, it automates the process of learning performant hyperparameters by making intelligent decisions about the optimization search space, thus reducing the need for user domain knowledge. These innovations allow Mulch to identify good GBT hyperparameters far more efficiently -- and in a more seamless and user-friendly way -- than existing black-box hyperparameter tuning systems.
    Dynamics of Temporal Difference Reinforcement Learning. (arXiv:2307.04841v1 [stat.ML])
    Reinforcement learning has been successful across several applications in which agents have to learn to act in environments with sparse feedback. However, despite this empirical success there is still a lack of theoretical understanding of how the parameters of reinforcement learning models and the features used to represent states interact to control the dynamics of learning. In this work, we use concepts from statistical physics, to study the typical case learning curves for temporal difference learning of a value function with linear function approximators. Our theory is derived under a Gaussian equivalence hypothesis where averages over the random trajectories are replaced with temporally correlated Gaussian feature averages and we validate our assumptions on small scale Markov Decision Processes. We find that the stochastic semi-gradient noise due to subsampling the space of possible episodes leads to significant plateaus in the value error, unlike in traditional gradient descent dynamics. We study how learning dynamics and plateaus depend on feature structure, learning rate, discount factor, and reward function. We then analyze how strategies like learning rate annealing and reward shaping can favorably alter learning dynamics and plateaus. To conclude, our work introduces new tools to open a new direction towards developing a theory of learning dynamics in reinforcement learning.
    Learning to Solve Constraint Satisfaction Problems with Recurrent Transformer. (arXiv:2307.04895v1 [cs.AI])
    Constraint satisfaction problems (CSPs) are about finding values of variables that satisfy the given constraints. We show that Transformer extended with recurrence is a viable approach to learning to solve CSPs in an end-to-end manner, having clear advantages over state-of-the-art methods such as Graph Neural Networks, SATNet, and some neuro-symbolic models. With the ability of Transformer to handle visual input, the proposed Recurrent Transformer can straightforwardly be applied to visual constraint reasoning problems while successfully addressing the symbol grounding problem. We also show how to leverage deductive knowledge of discrete constraints in the Transformer's inductive learning to achieve sample-efficient learning and semi-supervised learning for CSPs.
    Route, Interpret, Repeat: Blurring the line between post hoc explainability and interpretable models. (arXiv:2307.05350v1 [cs.LG])
    The current approach to ML model design is either to choose a flexible Blackbox model and explain it post hoc or to start with an interpretable model. Blackbox models are flexible but difficult to explain, whereas interpretable models are designed to be explainable. However, developing interpretable models necessitates extensive ML knowledge, and the resulting models tend to be less flexible, offering potentially subpar performance compared to their Blackbox equivalents. This paper aims to blur the distinction between a post hoc explanation of a BlackBox and constructing interpretable models. We propose beginning with a flexible BlackBox model and gradually \emph{carving out} a mixture of interpretable models and a \emph{residual network}. Our design identifies a subset of samples and \emph{routes} them through the interpretable models. The remaining samples are routed through a flexible residual network. We adopt First Order Logic (FOL) as the interpretable model's backbone, which provides basic reasoning on concepts retrieved from the BlackBox model. On the residual network, we repeat the method until the proportion of data explained by the residual network falls below a desired threshold. Our approach offers several advantages. First, the mixture of interpretable and flexible residual networks results in almost no compromise in performance. Second, the route, interpret, and repeat approach yields a highly flexible interpretable model. Our extensive experiment demonstrates the performance of the model on various datasets. We show that by editing the FOL model, we can fix the shortcut learned by the original BlackBox model. Finally, our method provides a framework for a hybrid symbolic-connectionist network that is simple to train and adaptable to many applications.
    SHAP@k:Efficient and Probably Approximately Correct (PAC) Identification of Top-k Features. (arXiv:2307.04850v1 [cs.LG])
    The SHAP framework provides a principled method to explain the predictions of a model by computing feature importance. Motivated by applications in finance, we introduce the Top-k Identification Problem (TkIP), where the objective is to identify the k features with the highest SHAP values. While any method to compute SHAP values with uncertainty estimates (such as KernelSHAP and SamplingSHAP) can be trivially adapted to solve TkIP, doing so is highly sample inefficient. The goal of our work is to improve the sample efficiency of existing methods in the context of solving TkIP. Our key insight is that TkIP can be framed as an Explore-m problem--a well-studied problem related to multi-armed bandits (MAB). This connection enables us to improve sample efficiency by leveraging two techniques from the MAB literature: (1) a better stopping-condition (to stop sampling) that identifies when PAC (Probably Approximately Correct) guarantees have been met and (2) a greedy sampling scheme that judiciously allocates samples between different features. By adopting these methods we develop KernelSHAP@k and SamplingSHAP@k to efficiently solve TkIP, offering an average improvement of $5\times$ in sample-efficiency and runtime across most common credit related datasets.
    Fed-CPrompt: Contrastive Prompt for Rehearsal-Free Federated Continual Learning. (arXiv:2307.04869v1 [cs.LG])
    Federated continual learning (FCL) learns incremental tasks over time from confidential datasets distributed across clients. This paper focuses on rehearsal-free FCL, which has severe forgetting issues when learning new tasks due to the lack of access to historical task data. To address this issue, we propose Fed-CPrompt based on prompt learning techniques to obtain task-specific prompts in a communication-efficient way. Fed-CPrompt introduces two key components, asynchronous prompt learning, and contrastive continual loss, to handle asynchronous task arrival and heterogeneous data distributions in FCL, respectively. Extensive experiments demonstrate the effectiveness of Fed-CPrompt in achieving SOTA rehearsal-free FCL performance.
    VisText: A Benchmark for Semantically Rich Chart Captioning. (arXiv:2307.05356v1 [cs.CV])
    Captions that describe or explain charts help improve recall and comprehension of the depicted data and provide a more accessible medium for people with visual disabilities. However, current approaches for automatically generating such captions struggle to articulate the perceptual or cognitive features that are the hallmark of charts (e.g., complex trends and patterns). In response, we introduce VisText: a dataset of 12,441 pairs of charts and captions that describe the charts' construction, report key statistics, and identify perceptual and cognitive phenomena. In VisText, a chart is available as three representations: a rasterized image, a backing data table, and a scene graph -- a hierarchical representation of a chart's visual elements akin to a web page's Document Object Model (DOM). To evaluate the impact of VisText, we fine-tune state-of-the-art language models on our chart captioning task and apply prefix-tuning to produce captions that vary the semantic content they convey. Our models generate coherent, semantically rich captions and perform on par with state-of-the-art chart captioning models across machine translation and text generation metrics. Through qualitative analysis, we identify six broad categories of errors that our models make that can inform future work.
    Forming Trees with Treeformers. (arXiv:2207.06960v2 [cs.CL] UPDATED)
    Human language is known to exhibit a nested, hierarchical structure, allowing us to form complex sentences out of smaller pieces. However, many state-of-the-art neural networks models such as Transformers have no explicit hierarchical structure in its architecture -- that is, they don't have an inductive bias toward hierarchical structure. Additionally, Transformers are known to perform poorly on compositional generalization tasks which require such structures. In this paper, we introduce Treeformer, a general-purpose encoder module inspired by the CKY algorithm which learns a composition operator and pooling function to construct hierarchical encodings for phrases and sentences. Our extensive experiments demonstrate the benefits of incorporating hierarchical structure into the Transformer and show significant improvements in compositional generalization as well as in downstream tasks such as machine translation, abstractive summarization, and various natural language understanding tasks.
    Decentralized Federated Learning: Fundamentals, State of the Art, Frameworks, Trends, and Challenges. (arXiv:2211.08413v3 [cs.LG] UPDATED)
    In the last decade, Federated Learning (FL) has gained relevance in training collaborative models without sharing sensitive data. Since its birth, Centralized FL (CFL) has been the most common approach in the literature, where a central entity creates a global model. However, a centralized approach leads to increased latency due to bottlenecks, heightened vulnerability to system failures, and trustworthiness concerns affecting the entity responsible for the global model creation. Decentralized Federated Learning (DFL) emerged to address these concerns by promoting decentralized model aggregation and minimizing reliance on centralized architectures. However, despite the work done in DFL, the literature has not (i) studied the main aspects differentiating DFL and CFL; (ii) analyzed DFL frameworks to create and evaluate new solutions; and (iii) reviewed application scenarios using DFL. Thus, this article identifies and analyzes the main fundamentals of DFL in terms of federation architectures, topologies, communication mechanisms, security approaches, and key performance indicators. Additionally, the paper at hand explores existing mechanisms to optimize critical DFL fundamentals. Then, the most relevant features of the current DFL frameworks are reviewed and compared. After that, it analyzes the most used DFL application scenarios, identifying solutions based on the fundamentals and frameworks previously defined. Finally, the evolution of existing DFL solutions is studied to provide a list of trends, lessons learned, and open challenges.
    DRMC: A Generalist Model with Dynamic Routing for Multi-Center PET Image Synthesis. (arXiv:2307.05249v1 [eess.IV])
    Multi-center positron emission tomography (PET) image synthesis aims at recovering low-dose PET images from multiple different centers. The generalizability of existing methods can still be suboptimal for a multi-center study due to domain shifts, which result from non-identical data distribution among centers with different imaging systems/protocols. While some approaches address domain shifts by training specialized models for each center, they are parameter inefficient and do not well exploit the shared knowledge across centers. To address this, we develop a generalist model that shares architecture and parameters across centers to utilize the shared knowledge. However, the generalist model can suffer from the center interference issue, \textit{i.e.} the gradient directions of different centers can be inconsistent or even opposite owing to the non-identical data distribution. To mitigate such interference, we introduce a novel dynamic routing strategy with cross-layer connections that routes data from different centers to different experts. Experiments show that our generalist model with dynamic routing (DRMC) exhibits excellent generalizability across centers. Code and data are available at: https://github.com/Yaziwel/Multi-Center-PET-Image-Synthesis.
    Combating Data Imbalances in Federated Semi-supervised Learning with Dual Regulators. (arXiv:2307.05358v1 [cs.LG])
    Federated learning has become a popular method to learn from decentralized heterogeneous data. Federated semi-supervised learning (FSSL) emerges to train models from a small fraction of labeled data due to label scarcity on decentralized clients. Existing FSSL methods assume independent and identically distributed (IID) labeled data across clients and consistent class distribution between labeled and unlabeled data within a client. This work studies a more practical and challenging scenario of FSSL, where data distribution is different not only across clients but also within a client between labeled and unlabeled data. To address this challenge, we propose a novel FSSL framework with dual regulators, FedDure.} FedDure lifts the previous assumption with a coarse-grained regulator (C-reg) and a fine-grained regulator (F-reg): C-reg regularizes the updating of the local model by tracking the learning effect on labeled data distribution; F-reg learns an adaptive weighting scheme tailored for unlabeled instances in each client. We further formulate the client model training as bi-level optimization that adaptively optimizes the model in the client with two regulators. Theoretically, we show the convergence guarantee of the dual regulators. Empirically, we demonstrate that FedDure is superior to the existing methods across a wide range of settings, notably by more than 11% on CIFAR-10 and CINIC-10 datasets.
    On the Need for a Language Describing Distribution Shifts: Illustrations on Tabular Datasets. (arXiv:2307.05284v1 [cs.LG])
    Different distribution shifts require different algorithmic and operational interventions. Methodological research must be grounded by the specific shifts they address. Although nascent benchmarks provide a promising empirical foundation, they implicitly focus on covariate shifts, and the validity of empirical findings depends on the type of shift, e.g., previous observations on algorithmic performance can fail to be valid when the $Y|X$ distribution changes. We conduct a thorough investigation of natural shifts in 5 tabular datasets over 86,000 model configurations, and find that $Y|X$-shifts are most prevalent. To encourage researchers to develop a refined language for distribution shifts, we build WhyShift, an empirical testbed of curated real-world shifts where we characterize the type of shift we benchmark performance over. Since $Y|X$-shifts are prevalent in tabular settings, we identify covariate regions that suffer the biggest $Y|X$-shifts and discuss implications for algorithmic and data-based interventions. Our testbed highlights the importance of future research that builds an understanding of how distributions differ.
    Supervised Attention Using Homophily in Graph Neural Networks. (arXiv:2307.05217v1 [cs.LG])
    Graph neural networks have become the standard approach for dealing with learning problems on graphs. Among the different variants of graph neural networks, graph attention networks (GATs) have been applied with great success to different tasks. In the GAT model, each node assigns an importance score to its neighbors using an attention mechanism. However, similar to other graph neural networks, GATs aggregate messages from nodes that belong to different classes, and therefore produce node representations that are not well separated with respect to the different classes, which might hurt their performance. In this work, to alleviate this problem, we propose a new technique that can be incorporated into any graph attention model to encourage higher attention scores between nodes that share the same class label. We evaluate the proposed method on several node classification datasets demonstrating increased performance over standard baseline models.
    Geometric Neural Diffusion Processes. (arXiv:2307.05431v1 [stat.ML])
    Denoising diffusion models have proven to be a flexible and effective paradigm for generative modelling. Their recent extension to infinite dimensional Euclidean spaces has allowed for the modelling of stochastic processes. However, many problems in the natural sciences incorporate symmetries and involve data living in non-Euclidean spaces. In this work, we extend the framework of diffusion models to incorporate a series of geometric priors in infinite-dimension modelling. We do so by a) constructing a noising process which admits, as limiting distribution, a geometric Gaussian process that transforms under the symmetry group of interest, and b) approximating the score with a neural network that is equivariant w.r.t. this group. We show that with these conditions, the generative functional model admits the same symmetry. We demonstrate scalability and capacity of the model, using a novel Langevin-based conditional sampler, to fit complex scalar and vector fields, with Euclidean and spherical codomain, on synthetic and real-world weather data.
    Leveraging an Alignment Set in Tackling Instance-Dependent Label Noise. (arXiv:2307.04868v1 [cs.LG])
    Noisy training labels can hurt model performance. Most approaches that aim to address label noise assume label noise is independent from the input features. In practice, however, label noise is often feature or \textit{instance-dependent}, and therefore biased (i.e., some instances are more likely to be mislabeled than others). E.g., in clinical care, female patients are more likely to be under-diagnosed for cardiovascular disease compared to male patients. Approaches that ignore this dependence can produce models with poor discriminative performance, and in many healthcare settings, can exacerbate issues around health disparities. In light of these limitations, we propose a two-stage approach to learn in the presence instance-dependent label noise. Our approach utilizes \textit{\anchor points}, a small subset of data for which we know the observed and ground truth labels. On several tasks, our approach leads to consistent improvements over the state-of-the-art in discriminative performance (AUROC) while mitigating bias (area under the equalized odds curve, AUEOC). For example, when predicting acute respiratory failure onset on the MIMIC-III dataset, our approach achieves a harmonic mean (AUROC and AUEOC) of 0.84 (SD [standard deviation] 0.01) while that of the next best baseline is 0.81 (SD 0.01). Overall, our approach improves accuracy while mitigating potential bias compared to existing approaches in the presence of instance-dependent label noise.
    PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR. (arXiv:2307.04995v1 [cs.LG])
    Deep neural networks (DNNs) are of critical use in different domains. To accelerate DNN computation, tensor compilers are proposed to generate efficient code on different domain-specific accelerators. Existing tensor compilers mainly focus on optimizing computation efficiency. However, memory access is becoming a key performance bottleneck because the computational performance of accelerators is increasing much faster than memory performance. The lack of direct description of memory access and data dependence in current tensor compilers' intermediate representation (IR) brings significant challenges to generate memory-efficient code. In this paper, we propose IntelliGen, a tensor compiler that can generate high-performance code for memory-intensive operators by considering both computation and data movement optimizations. IntelliGen represent a DNN program using GIR, which includes primitives indicating its computation, data movement, and parallel strategies. This information will be further composed as an instruction-level dataflow graph to perform holistic optimizations by searching different memory access patterns and computation operations, and generating memory-efficient code on different hardware. We evaluate IntelliGen on NVIDIA GPU, AMD GPU, and Cambricon MLU, showing speedup up to 1.97x, 2.93x, and 16.91x(1.28x, 1.23x, and 2.31x on average), respectively, compared to current most performant frameworks.
    Estimating label quality and errors in semantic segmentation data via any model. (arXiv:2307.05080v1 [cs.LG])
    The labor-intensive annotation process of semantic segmentation datasets is often prone to errors, since humans struggle to label every pixel correctly. We study algorithms to automatically detect such annotation errors, in particular methods to score label quality, such that the images with the lowest scores are least likely to be correctly labeled. This helps prioritize what data to review in order to ensure a high-quality training/evaluation dataset, which is critical in sensitive applications such as medical imaging and autonomous vehicles. Widely applicable, our label quality scores rely on probabilistic predictions from a trained segmentation model -- any model architecture and training procedure can be utilized. Here we study 7 different label quality scoring methods used in conjunction with a DeepLabV3+ or a FPN segmentation model to detect annotation errors in a version of the SYNTHIA dataset. Precision-recall evaluations reveal a score -- the soft-minimum of the model-estimated likelihoods of each pixel's annotated class -- that is particularly effective to identify images that are mislabeled, across multiple types of annotation error.
    FedYolo: Augmenting Federated Learning with Pretrained Transformers. (arXiv:2307.04905v1 [cs.LG])
    The growth and diversity of machine learning applications motivate a rethinking of learning with mobile and edge devices. How can we address diverse client goals and learn with scarce heterogeneous data? While federated learning aims to address these issues, it has challenges hindering a unified solution. Large transformer models have been shown to work across a variety of tasks achieving remarkable few-shot adaptation. This raises the question: Can clients use a single general-purpose model, rather than custom models for each task, while obeying device and network constraints? In this work, we investigate pretrained transformers (PTF) to achieve these on-device learning goals and thoroughly explore the roles of model size and modularity, where the latter refers to adaptation through modules such as prompts or adapters. Focusing on federated learning, we demonstrate that: (1) Larger scale shrinks the accuracy gaps between alternative approaches and improves heterogeneity robustness. Scale allows clients to run more local SGD epochs which can significantly reduce the number of communication rounds. At the extreme, clients can achieve respectable accuracy locally highlighting the potential of fully-local learning. (2) Modularity, by design, enables $>$100$\times$ less communication in bits. Surprisingly, it also boosts the generalization capability of local adaptation methods and the robustness of smaller PTFs. Finally, it enables clients to solve multiple unrelated tasks simultaneously using a single PTF, whereas full updates are prone to catastrophic forgetting. These insights on scale and modularity motivate a new federated learning approach we call "You Only Load Once" (FedYolo): The clients load a full PTF model once and all future updates are accomplished through communication-efficient modules with limited catastrophic-forgetting, where each task is assigned to its own module.
    ClimaX: A foundation model for weather and climate. (arXiv:2301.10343v3 [cs.LG] UPDATED)
    Most state-of-the-art approaches for weather and climate modeling are based on physics-informed numerical models of the atmosphere. These approaches aim to model the non-linear dynamics and complex interactions between multiple variables, which are challenging to approximate. Additionally, many such numerical models are computationally intensive, especially when modeling the atmospheric phenomenon at a fine-grained spatial and temporal resolution. Recent data-driven approaches based on machine learning instead aim to directly solve a downstream forecasting or projection task by learning a data-driven functional mapping using deep neural networks. However, these networks are trained using curated and homogeneous climate datasets for specific spatiotemporal tasks, and thus lack the generality of numerical models. We develop and demonstrate ClimaX, a flexible and generalizable deep learning model for weather and climate science that can be trained using heterogeneous datasets spanning different variables, spatio-temporal coverage, and physical groundings. ClimaX extends the Transformer architecture with novel encoding and aggregation blocks that allow effective use of available compute while maintaining general utility. ClimaX is pre-trained with a self-supervised learning objective on climate datasets derived from CMIP6. The pre-trained ClimaX can then be fine-tuned to address a breadth of climate and weather tasks, including those that involve atmospheric variables and spatio-temporal scales unseen during pretraining. Compared to existing data-driven baselines, we show that this generality in ClimaX results in superior performance on benchmarks for weather forecasting and climate projections, even when pretrained at lower resolutions and compute budgets. The source code is available at https://github.com/microsoft/ClimaX.
    Reject option models comprising out-of-distribution detection. (arXiv:2307.05199v1 [cs.LG])
    The optimal prediction strategy for out-of-distribution (OOD) setups is a fundamental question in machine learning. In this paper, we address this question and present several contributions. We propose three reject option models for OOD setups: the Cost-based model, the Bounded TPR-FPR model, and the Bounded Precision-Recall model. These models extend the standard reject option models used in non-OOD setups and define the notion of an optimal OOD selective classifier. We establish that all the proposed models, despite their different formulations, share a common class of optimal strategies. Motivated by the optimal strategy, we introduce double-score OOD methods that leverage uncertainty scores from two chosen OOD detectors: one focused on OOD/ID discrimination and the other on misclassification detection. The experimental results consistently demonstrate the superior performance of this simple strategy compared to state-of-the-art methods. Additionally, we propose novel evaluation metrics derived from the definition of the optimal strategy under the proposed OOD rejection models. These new metrics provide a comprehensive and reliable assessment of OOD methods without the deficiencies observed in existing evaluation approaches.
    MAP- and MLE-Based Teaching. (arXiv:2307.05252v1 [cs.LG])
    Imagine a learner L who tries to infer a hidden concept from a collection of observations. Building on the work [4] of Ferri et al., we assume the learner to be parameterized by priors P(c) and by c-conditional likelihoods P(z|c) where c ranges over all concepts in a given class C and z ranges over all observations in an observation set Z. L is called a MAP-learner (resp. an MLE-learner) if it thinks of a collection S of observations as a random sample and returns the concept with the maximum a-posteriori probability (resp. the concept which maximizes the c-conditional likelihood of S). Depending on whether L assumes that S is obtained from ordered or unordered sampling resp. from sampling with or without replacement, we can distinguish four different sampling modes. Given a target concept c in C, a teacher for a MAP-learner L aims at finding a smallest collection of observations that causes L to return c. This approach leads in a natural manner to various notions of a MAP- or MLE-teaching dimension of a concept class C. Our main results are: We show that this teaching model has some desirable monotonicity properties. We clarify how the four sampling modes are related to each other. As for the (important!) special case, where concepts are subsets of a domain and observations are 0,1-labeled examples, we obtain some additional results. First of all, we characterize the MAP- and MLE-teaching dimension associated with an optimally parameterized MAP-learner graph-theoretically. From this central result, some other ones are easy to derive. It is shown, for instance, that the MLE-teaching dimension is either equal to the MAP-teaching dimension or exceeds the latter by 1. It is shown furthermore that these dimensions can be bounded from above by the so-called antichain number, the VC-dimension and related combinatorial parameters. Moreover they can be computed in polynomial time.
    Hybrid hidden Markov LSTM for short-term traffic flow prediction. (arXiv:2307.04954v1 [cs.LG])
    Deep learning (DL) methods have outperformed parametric models such as historical average, ARIMA and variants in predicting traffic variables into short and near-short future, that are critical for traffic management. Specifically, recurrent neural network (RNN) and its variants (e.g. long short-term memory) are designed to retain long-term temporal correlations and therefore are suitable for modeling sequences. However, multi-regime models assume the traffic system to evolve through multiple states (say, free-flow, congestion in traffic) with distinct characteristics, and hence, separate models are trained to characterize the traffic dynamics within each regime. For instance, Markov-switching models with a hidden Markov model (HMM) for regime identification is capable of capturing complex dynamic patterns and non-stationarity. Interestingly, both HMM and LSTM can be used for modeling an observation sequence from a set of latent or, hidden state variables. In LSTM, the latent variable is computed in a deterministic manner from the current observation and the previous latent variable, while, in HMM, the set of latent variables is a Markov chain. Inspired by research in natural language processing, a hybrid hidden Markov-LSTM model that is capable of learning complementary features in traffic data is proposed for traffic flow prediction. Results indicate significant performance gains in using hybrid architecture compared to conventional methods such as Markov switching ARIMA and LSTM.
    Measuring and Mitigating Interference in Reinforcement Learning. (arXiv:2307.04887v1 [cs.LG])
    Catastrophic interference is common in many network-based learning systems, and many proposals exist for mitigating it. Before overcoming interference we must understand it better. In this work, we provide a definition and novel measure of interference for value-based reinforcement learning methods such as Fitted Q-Iteration and DQN. We systematically evaluate our measure of interference, showing that it correlates with instability in control performance, across a variety of network architectures. Our new interference measure allows us to ask novel scientific questions about commonly used deep learning architectures and study learning algorithms which mitigate interference. Lastly, we outline a class of algorithms which we call online-aware that are designed to mitigate interference, and show they do reduce interference according to our measure and that they improve stability and performance in several classic control environments.
    Predicting small molecules solubilities on endpoint devices using deep ensemble neural networks. (arXiv:2307.05318v1 [physics.chem-ph])
    Aqueous solubility is a valuable yet challenging property to predict. Computing solubility using first-principles methods requires accounting for the competing effects of entropy and enthalpy, resulting in long computations for relatively poor accuracy. Data-driven approaches, such as deep learning, offer improved accuracy and computational efficiency but typically lack uncertainty quantification. Additionally, ease of use remains a concern for any computational technique, resulting in the sustained popularity of group-based contribution methods. In this work, we addressed these problems with a deep learning model with predictive uncertainty that runs on a static website (without a server). This approach moves computing needs onto the website visitor without requiring installation, removing the need to pay for and maintain servers. Our model achieves satisfactory results in solubility prediction. Furthermore, we demonstrate how to create molecular property prediction models that balance uncertainty and ease of use. The code is available at \url{https://github.com/ur-whitelab/mol.dev}, and the model is usable at \url{https://mol.dev}.
    Compact Twice Fusion Network for Edge Detection. (arXiv:2307.04952v1 [cs.CV])
    The significance of multi-scale features has been gradually recognized by the edge detection community. However, the fusion of multi-scale features increases the complexity of the model, which is not friendly to practical application. In this work, we propose a Compact Twice Fusion Network (CTFN) to fully integrate multi-scale features while maintaining the compactness of the model. CTFN includes two lightweight multi-scale feature fusion modules: a Semantic Enhancement Module (SEM) that can utilize the semantic information contained in coarse-scale features to guide the learning of fine-scale features, and a Pseudo Pixel-level Weighting (PPW) module that aggregate the complementary merits of multi-scale features by assigning weights to all features. Notwithstanding all this, the interference of texture noise makes the correct classification of some pixels still a challenge. For these hard samples, we propose a novel loss function, coined Dynamic Focal Loss, which reshapes the standard cross-entropy loss and dynamically adjusts the weights to correct the distribution of hard samples. We evaluate our method on three datasets, i.e., BSDS500, NYUDv2, and BIPEDv2. Compared with state-of-the-art methods, CTFN achieves competitive accuracy with less parameters and computational cost. Apart from the backbone, CTFN requires only 0.1M additional parameters, which reduces its computation cost to just 60% of other state-of-the-art methods. The codes are available at https://github.com/Li-yachuan/CTFN-pytorch-master.
    Intrinsically motivated graph exploration using network theories of human curiosity. (arXiv:2307.04962v1 [cs.LG])
    Intrinsically motivated exploration has proven useful for reinforcement learning, even without additional extrinsic rewards. When the environment is naturally represented as a graph, how to guide exploration best remains an open question. In this work, we propose a novel approach for exploring graph-structured data motivated by two theories of human curiosity: the information gap theory and the compression progress theory. The theories view curiosity as an intrinsic motivation to optimize for topological features of subgraphs induced by the visited nodes in the environment. We use these proposed features as rewards for graph neural-network-based reinforcement learning. On multiple classes of synthetically generated graphs, we find that trained agents generalize to larger environments and to longer exploratory walks than are seen during training. Our method computes more efficiently than the greedy evaluation of the relevant topological properties. The proposed intrinsic motivations bear particular relevance for recommender systems. We demonstrate that curiosity-based recommendations are more predictive of human behavior than PageRank centrality for several real-world graph datasets, including MovieLens, Amazon Books, and Wikispeedia.
    Empowering recommender systems using automatically generated Knowledge Graphs and Reinforcement Learning. (arXiv:2307.04996v1 [cs.IR])
    Personalized recommendations have a growing importance in direct marketing, which motivates research to enhance customer experiences by knowledge graph (KG) applications. For example, in financial services, companies may benefit from providing relevant financial articles to their customers to cultivate relationships, foster client engagement and promote informed financial decisions. While several approaches center on KG-based recommender systems for improved content, in this study we focus on interpretable KG-based recommender systems for decision making.To this end, we present two knowledge graph-based approaches for personalized article recommendations for a set of customers of a large multinational financial services company. The first approach employs Reinforcement Learning and the second approach uses the XGBoost algorithm for recommending articles to the customers. Both approaches make use of a KG generated from both structured (tabular data) and unstructured data (a large body of text data).Using the Reinforcement Learning-based recommender system we could leverage the graph traversal path leading to the recommendation as a way to generate interpretations (Path Directed Reasoning (PDR)). In the XGBoost-based approach, one can also provide explainable results using post-hoc methods such as SHAP (SHapley Additive exPlanations) and ELI5 (Explain Like I am Five).Importantly, our approach offers explainable results, promoting better decision-making. This study underscores the potential of combining advanced machine learning techniques with KG-driven insights to bolster experience in customer relationship management.
    BayesFlow: Amortized Bayesian Workflows With Neural Networks. (arXiv:2306.16015v2 [cs.LG] UPDATED)
    Modern Bayesian inference involves a mixture of computational techniques for estimating, validating, and drawing conclusions from probabilistic models as part of principled workflows for data analysis. Typical problems in Bayesian workflows are the approximation of intractable posterior distributions for diverse model types and the comparison of competing models of the same process in terms of their complexity and predictive performance. This manuscript introduces the Python library BayesFlow for simulation-based training of established neural network architectures for amortized data compression and inference. Amortized Bayesian inference, as implemented in BayesFlow, enables users to train custom neural networks on model simulations and re-use these networks for any subsequent application of the models. Since the trained networks can perform inference almost instantaneously, the upfront neural network training is quickly amortized.
    Self Expanding Neural Networks. (arXiv:2307.04526v2 [cs.LG] UPDATED)
    The results of training a neural network are heavily dependent on the architecture chosen; and even a modification of only the size of the network, however small, typically involves restarting the training process. In contrast to this, we begin training with a small architecture, only increase its capacity as necessary for the problem, and avoid interfering with previous optimization while doing so. We thereby introduce a natural gradient based approach which intuitively expands both the width and depth of a neural network when this is likely to substantially reduce the hypothetical converged training loss. We prove an upper bound on the "rate" at which neurons are added, and a computationally cheap lower bound on the expansion score. We illustrate the benefits of such Self-Expanding Neural Networks in both classification and regression problems, including those where the appropriate architecture size is substantially uncertain a priori.
    TOAST: Transfer Learning via Attention Steering. (arXiv:2305.15542v2 [cs.CV] UPDATED)
    Transfer learning involves adapting a pre-trained model to novel downstream tasks. However, we observe that current transfer learning methods often fail to focus on task-relevant features. In this work, we explore refocusing model attention for transfer learning. We introduce Top-Down Attention Steering (TOAST), a novel transfer learning algorithm that keeps the pre-trained backbone frozen, selects task-relevant features in the output, and feeds those features back to the model to steer the attention to the task-specific features. By refocusing the attention only, TOAST achieves state-of-the-art results on a number of transfer learning benchmarks, while having a small number of tunable parameters. Compared to fully fine-tuning, LoRA, and prompt tuning, TOAST substantially improves performance across a range of fine-grained visual classification datasets (e.g., 81.1% -> 86.2% on FGVC). TOAST also outperforms the fully fine-tuned Alpaca and Vicuna models on instruction-following language generation. Code is available at https://github.com/bfshi/TOAST.
    Empowering Cross-lingual Behavioral Testing of NLP Models with Typological Features. (arXiv:2307.05454v1 [cs.CL])
    A challenge towards developing NLP systems for the world's languages is understanding how they generalize to typological differences relevant for real-world applications. To this end, we propose M2C, a morphologically-aware framework for behavioral testing of NLP models. We use M2C to generate tests that probe models' behavior in light of specific linguistic features in 12 typologically diverse languages. We evaluate state-of-the-art language models on the generated tests. While models excel at most tests in English, we highlight generalization failures to specific typological characteristics such as temporal expressions in Swahili and compounding possessives in Finish. Our findings motivate the development of models that address these blind spots.
    Smooth Monotonic Networks. (arXiv:2306.01147v2 [cs.LG] UPDATED)
    Monotonicity constraints are powerful regularizers in statistical modelling. They can support fairness in computer supported decision making and increase plausibility in data-driven scientific models. The seminal min-max (MM) neural network architecture ensures monotonicity, but often gets stuck in undesired local optima during training because of vanishing gradients. We propose a simple modification of the MM network using strictly-increasing smooth non-linearities that alleviates this problem. The resulting smooth min-max (SMM) network module inherits the asymptotic approximation properties from the MM architecture. It can be used within larger deep learning systems trained end-to-end. The SMM module is considerably simpler and less computationally demanding than state-of-the-art neural networks for monotonic modelling. Still, in our experiments, it compared favorably to alternative neural and non-neural approaches in terms of generalization performance.
    Comparison of High-Dimensional Bayesian Optimization Algorithms on BBOB. (arXiv:2303.00890v2 [cs.LG] UPDATED)
    Bayesian Optimization (BO) is a class of black-box, surrogate-based heuristics that can efficiently optimize problems that are expensive to evaluate, and hence admit only small evaluation budgets. BO is particularly popular for solving numerical optimization problems in industry, where the evaluation of objective functions often relies on time-consuming simulations or physical experiments. However, many industrial problems depend on a large number of parameters. This poses a challenge for BO algorithms, whose performance is often reported to suffer when the dimension grows beyond 15 variables. Although many new algorithms have been proposed to address this problem, it is not well understood which one is the best for which optimization scenario. In this work, we compare five state-of-the-art high-dimensional BO algorithms, with vanilla BO and CMA-ES on the 24 BBOB functions of the COCO environment at increasing dimensionality, ranging from 10 to 60 variables. Our results confirm the superiority of BO over CMA-ES for limited evaluation budgets and suggest that the most promising approach to improve BO is the use of trust regions. However, we also observe significant performance differences for different function landscapes and budget exploitation phases, indicating improvement potential, e.g., through hybridization of algorithmic components.
    Capafoldable: self-tracking foldable smart textiles with capacitive sensing. (arXiv:2307.05370v1 [cs.HC])
    Folding is an unique structural technique to enable planer materials with motion or 3D mechanical properties. Textile-based capacitive sensing has shown to be sensitive to the geometry deformation and relative motion of conductive textiles. In this work, we propose a novel self-tracking foldable smart textile by combining folded fabric structures and capacitive sensing to detect the structural motions using state-of-the-art sensing circuits and deep learning technologies. We created two folding patterns, Accordion and Chevron, each with two layouts of capacitive sensors in the form of thermobonded conductive textile patches. In an experiment of manually moving patches of the folding patterns, we developed deep neural network to learn and reconstruct the vision-tracked shape of the patches. Through our approach, the geometry primitives defining the patch shape can be reconstructed from the capacitive signals with R-squared value of up to 95\% and tracking error of 1cm for 22.5cm long patches. With mechanical, electrical and sensing properties, Capafoldable could enable a new range of smart textile applications.
    Differential Analysis of Triggers and Benign Features for Black-Box DNN Backdoor Detection. (arXiv:2307.05422v1 [cs.CR])
    This paper proposes a data-efficient detection method for deep neural networks against backdoor attacks under a black-box scenario. The proposed approach is motivated by the intuition that features corresponding to triggers have a higher influence in determining the backdoored network output than any other benign features. To quantitatively measure the effects of triggers and benign features on determining the backdoored network output, we introduce five metrics. To calculate the five-metric values for a given input, we first generate several synthetic samples by injecting the input's partial contents into clean validation samples. Then, the five metrics are computed by using the output labels of the corresponding synthetic samples. One contribution of this work is the use of a tiny clean validation dataset. Having the computed five metrics, five novelty detectors are trained from the validation dataset. A meta novelty detector fuses the output of the five trained novelty detectors to generate a meta confidence score. During online testing, our method determines if online samples are poisoned or not via assessing their meta confidence scores output by the meta novelty detector. We show the efficacy of our methodology through a broad range of backdoor attacks, including ablation studies and comparison to existing approaches. Our methodology is promising since the proposed five metrics quantify the inherent differences between clean and poisoned samples. Additionally, our detection method can be incrementally improved by appending more metrics that may be proposed to address future advanced attacks.
    Selective Sampling and Imitation Learning via Online Regression. (arXiv:2307.04998v1 [cs.LG])
    We consider the problem of Imitation Learning (IL) by actively querying noisy expert for feedback. While imitation learning has been empirically successful, much of prior work assumes access to noiseless expert feedback which is not practical in many applications. In fact, when one only has access to noisy expert feedback, algorithms that rely on purely offline data (non-interactive IL) can be shown to need a prohibitively large number of samples to be successful. In contrast, in this work, we provide an interactive algorithm for IL that uses selective sampling to actively query the noisy expert for feedback. Our contributions are twofold: First, we provide a new selective sampling algorithm that works with general function classes and multiple actions, and obtains the best-known bounds for the regret and the number of queries. Next, we extend this analysis to the problem of IL with noisy expert feedback and provide a new IL algorithm that makes limited queries. Our algorithm for selective sampling leverages function approximation, and relies on an online regression oracle w.r.t.~the given model class to predict actions, and to decide whether to query the expert for its label. On the theoretical side, the regret bound of our algorithm is upper bounded by the regret of the online regression oracle, while the query complexity additionally depends on the eluder dimension of the model class. We complement this with a lower bound that demonstrates that our results are tight. We extend our selective sampling algorithm for IL with general function approximation and provide bounds on both the regret and the number of queries made to the noisy expert. A key novelty here is that our regret and query complexity bounds only depend on the number of times the optimal policy (and not the noisy expert, or the learner) go to states that have a small margin.
    Optimal Algorithms for Latent Bandits with Cluster Structure. (arXiv:2301.07040v3 [cs.LG] UPDATED)
    We consider the problem of latent bandits with cluster structure where there are multiple users, each with an associated multi-armed bandit problem. These users are grouped into \emph{latent} clusters such that the mean reward vectors of users within the same cluster are identical. At each round, a user, selected uniformly at random, pulls an arm and observes a corresponding noisy reward. The goal of the users is to maximize their cumulative rewards. This problem is central to practical recommendation systems and has received wide attention of late \cite{gentile2014online, maillard2014latent}. Now, if each user acts independently, then they would have to explore each arm independently and a regret of $\Omega(\sqrt{\mathsf{MNT}})$ is unavoidable, where $\mathsf{M}, \mathsf{N}$ are the number of arms and users, respectively. Instead, we propose LATTICE (Latent bAndiTs via maTrIx ComplEtion) which allows exploitation of the latent cluster structure to provide the minimax optimal regret of $\widetilde{O}(\sqrt{(\mathsf{M}+\mathsf{N})\mathsf{T}})$, when the number of clusters is $\widetilde{O}(1)$. This is the first algorithm to guarantee such strong regret bound. LATTICE is based on a careful exploitation of arm information within a cluster while simultaneously clustering users. Furthermore, it is computationally efficient and requires only $O(\log{\mathsf{T}})$ calls to an offline matrix completion oracle across all $\mathsf{T}$ rounds.
    CrysMMNet: Multimodal Representation for Crystal Property Prediction. (arXiv:2307.05390v1 [cond-mat.mtrl-sci])
    Machine Learning models have emerged as a powerful tool for fast and accurate prediction of different crystalline properties. Exiting state-of-the-art models rely on a single modality of crystal data i.e. crystal graph structure, where they construct multi-graph by establishing edges between nearby atoms in 3D space and apply GNN to learn materials representation. Thereby, they encode local chemical semantics around the atoms successfully but fail to capture important global periodic structural information like space group number, crystal symmetry, rotational information, etc, which influence different crystal properties. In this work, we leverage textual descriptions of materials to model global structural information into graph structure and learn a more robust and enriched representation of crystalline materials. To this effect, we first curate a textual dataset for crystalline material databases containing descriptions of each material. Further, we propose CrysMMNet, a simple multi-modal framework, which fuses both structural and textual representation together to generate a joint multimodal representation of crystalline materials. We conduct extensive experiments on two benchmark datasets across ten different properties to show that CrysMMNet outperforms existing state-of-the-art baseline methods with a good margin. We also observe that fusing the textual representation with crystal graph structure provides consistent improvement for all the SOTA GNN models compared to their own vanilla versions. We have shared the textual dataset, that we have curated for both the benchmark material databases, with the community for future use.
    To Raise or Not To Raise: The Autonomous Learning Rate Question. (arXiv:2106.08767v3 [cs.LG] UPDATED)
    There is a parameter ubiquitous throughout the deep learning world: learning rate. There is likewise a ubiquitous question: what should that learning rate be? The true answer to this question is often tedious and time consuming to obtain, and a great deal of arcane knowledge has accumulated in recent years over how to pick and modify learning rates to achieve optimal training performance. Moreover, the long hours spent carefully crafting the perfect learning rate can come to nothing the moment your network architecture, optimizer, dataset, or initial conditions change ever so slightly. But it need not be this way. We propose a new answer to the great learning rate question: the Autonomous Learning Rate Controller. Find it at https://github.com/fastestimator/ARC/tree/v2.0
    Secrets of RLHF in Large Language Models Part I: PPO. (arXiv:2307.04964v1 [cs.CL])
    Large language models (LLMs) have formulated a blueprint for the advancement of artificial general intelligence. Its primary objective is to function as a human-centric (helpful, honest, and harmless) assistant. Alignment with humans assumes paramount significance, and reinforcement learning with human feedback (RLHF) emerges as the pivotal technological paradigm underpinning this pursuit. Current technical routes usually include \textbf{reward models} to measure human preferences, \textbf{Proximal Policy Optimization} (PPO) to optimize policy model outputs, and \textbf{process supervision} to improve step-by-step reasoning capabilities. However, due to the challenges of reward design, environment interaction, and agent training, coupled with huge trial and error cost of large language models, there is a significant barrier for AI researchers to motivate the development of technical alignment and safe landing of LLMs. The stable training of RLHF has still been a puzzle. In the first report, we dissect the framework of RLHF, re-evaluate the inner workings of PPO, and explore how the parts comprising PPO algorithms impact policy agent training. We identify policy constraints being the key factor for the effective implementation of the PPO algorithm. Therefore, we explore the PPO-max, an advanced version of PPO algorithm, to efficiently improve the training stability of the policy model. Based on our main results, we perform a comprehensive analysis of RLHF abilities compared with SFT models and ChatGPT. The absence of open-source implementations has posed significant challenges to the investigation of LLMs alignment. Therefore, we are eager to release technical reports, reward models and PPO codes
    Automated Detection of Double Nuclei Galaxies using GOTHIC and the Discovery of a Large Sample of Dual AGN. (arXiv:2011.12177v4 [astro-ph.GA] UPDATED)
    We present a novel algorithm to detect double nuclei galaxies (DNG) called GOTHIC (Graph BOosted iterated HIll Climbing) - that detects whether a given image of a galaxy has two or more closely separated nuclei. Our aim is to detect samples of dual or multiple active galactic nuclei (AGN) in galaxies. Although galaxy mergers are common, the detection of dual AGN is rare. Their detection is very important as they help us understand the formation of supermassive black hole (SMBH) binaries, SMBH growth and AGN feedback effects in multiple nuclei systems. There is thus a need for an algorithm to do a systematic survey of existing imaging data for the discovery of DNGs and dual AGN. We have tested GOTHIC on a known sample of DNGs and subsequently applied it to a sample of a million SDSS DR16 galaxies lying in the redshift range of 0 to 0.75 approximately, and have available spectroscopic data. We have detected 159 dual AGN in this sample, of which 2 are triple AGN systems. Our results show that dual AGN are not common, and triple AGN even rarer. The color (u-r) magnitude plots of the DNGs indicate that star formation is quenched as the nuclei come closer and as the AGN fraction increases. The quenching is especially prominent for dual/triple AGN galaxies that lie in the extreme end of the red sequence.
    Performance Optimization for Variable Bitwidth Federated Learning in Wireless Networks. (arXiv:2209.10200v3 [cs.LG] UPDATED)
    This paper considers improving wireless communication and computation efficiency in federated learning (FL) via model quantization. In the proposed bitwidth FL scheme, edge devices train and transmit quantized versions of their local FL model parameters to a coordinating server, which aggregates them into a quantized global model and synchronizes the devices. The goal is to jointly determine the bitwidths employed for local FL model quantization and the set of devices participating in FL training at each iteration. We pose this as an optimization problem that aims to minimize the training loss of quantized FL under a per-iteration device sampling budget and delay requirement. However, the formulated problem is difficult to solve without (i) a concrete understanding of how quantization impacts global ML performance and (ii) the ability of the server to construct estimates of this process efficiently. To address the first challenge, we analytically characterize how limited wireless resources and induced quantization errors affect the performance of the proposed FL method. Our results quantify how the improvement of FL training loss between two consecutive iterations depends on the device selection and quantization scheme as well as on several parameters inherent to the model being learned. Then, we show that the FL training process can be described as a Markov decision process and propose a model-based reinforcement learning (RL) method to optimize action selection over iterations. Compared to model-free RL, this model-based RL approach leverages the derived mathematical characterization of the FL training process to discover an effective device selection and quantization scheme without imposing additional device communication overhead. Simulation results show that the proposed FL algorithm can reduce the convergence time.
    Improving RNN-Transducers with Acoustic LookAhead. (arXiv:2307.05006v1 [cs.CL])
    RNN-Transducers (RNN-Ts) have gained widespread acceptance as an end-to-end model for speech to text conversion because of their high accuracy and streaming capabilities. A typical RNN-T independently encodes the input audio and the text context, and combines the two encodings by a thin joint network. While this architecture provides SOTA streaming accuracy, it also makes the model vulnerable to strong LM biasing which manifests as multi-step hallucination of text without acoustic evidence. In this paper we propose LookAhead that makes text representations more acoustically grounded by looking ahead into the future within the audio input. This technique yields a significant 5%-20% relative reduction in word error rate on both in-domain and out-of-domain evaluation sets.
    Number Systems for Deep Neural Network Architectures: A Survey. (arXiv:2307.05035v1 [cs.NE])
    Deep neural networks (DNNs) have become an enabling component for a myriad of artificial intelligence applications. DNNs have shown sometimes superior performance, even compared to humans, in cases such as self-driving, health applications, etc. Because of their computational complexity, deploying DNNs in resource-constrained devices still faces many challenges related to computing complexity, energy efficiency, latency, and cost. To this end, several research directions are being pursued by both academia and industry to accelerate and efficiently implement DNNs. One important direction is determining the appropriate data representation for the massive amount of data involved in DNN processing. Using conventional number systems has been found to be sub-optimal for DNNs. Alternatively, a great body of research focuses on exploring suitable number systems. This article aims to provide a comprehensive survey and discussion about alternative number systems for more efficient representations of DNN data. Various number systems (conventional/unconventional) exploited for DNNs are discussed. The impact of these number systems on the performance and hardware design of DNNs is considered. In addition, this paper highlights the challenges associated with each number system and various solutions that are proposed for addressing them. The reader will be able to understand the importance of an efficient number system for DNN, learn about the widely used number systems for DNN, understand the trade-offs between various number systems, and consider various design aspects that affect the impact of number systems on DNN performance. In addition, the recent trends and related research opportunities will be highlighted
    Learned Kernels for Interpretable and Efficient PPG Signal Quality Assessment and Artifact Segmentation. (arXiv:2307.05385v1 [eess.SP])
    Photoplethysmography (PPG) provides a low-cost, non-invasive method to continuously monitor various cardiovascular parameters. PPG signals are generated by wearable devices and frequently contain large artifacts caused by external factors, such as motion of the human subject. In order to ensure robust and accurate extraction of physiological parameters, corrupted areas of the signal need to be identified and handled appropriately. Previous methodology relied either on handcrafted feature detectors or signal metrics which yield sub-optimal performance, or relied on machine learning techniques such as deep neural networks (DNN) which lack interpretability and are computationally and memory intensive. In this work, we present a novel method to learn a small set of interpretable convolutional kernels that has performance similar to -- and often better than -- the state-of-the-art DNN approach with several orders of magnitude fewer parameters. This work allows for efficient, robust, and interpretable signal quality assessment and artifact segmentation on low-power devices.
    Fisher-Weighted Merge of Contrastive Learning Models in Sequential Recommendation. (arXiv:2307.05476v1 [cs.IR])
    Along with the exponential growth of online platforms and services, recommendation systems have become essential for identifying relevant items based on user preferences. The domain of sequential recommendation aims to capture evolving user preferences over time. To address dynamic preference, various contrastive learning methods have been proposed to target data sparsity, a challenge in recommendation systems due to the limited user-item interactions. In this paper, we are the first to apply the Fisher-Merging method to Sequential Recommendation, addressing and resolving practical challenges associated with it. This approach ensures robust fine-tuning by merging the parameters of multiple models, resulting in improved overall performance. Through extensive experiments, we demonstrate the effectiveness of our proposed methods, highlighting their potential to advance the state-of-the-art in sequential learning and recommendation systems.
    Neural network analysis of neutron and X-ray reflectivity data: Incorporating prior knowledge for tackling the phase problem. (arXiv:2307.05364v1 [eess.SP])
    Due to the lack of phase information, determining the physical parameters of multilayer thin films from measured neutron and X-ray reflectivity curves is, on a fundamental level, an underdetermined inverse problem. This so-called phase problem poses limitations on standard neural networks, constraining the range and number of considered parameters in previous machine learning solutions. To overcome this, we present an approach that utilizes prior knowledge to regularize the training process over larger parameter spaces. We demonstrate the effectiveness of our method in various scenarios, including multilayer structures with box model parameterization and a physics-inspired special parameterization of the scattering length density profile for a multilayer structure. By leveraging the input of prior knowledge, we can improve the training dynamics and address the underdetermined ("ill-posed") nature of the problem. In contrast to previous methods, our approach scales favorably when increasing the complexity of the inverse problem, working properly even for a 5-layer multilayer model and an N-layer periodic multilayer model with up to 17 open parameters.
    Uncertainty Quantification of the Virial Black Hole Mass with Conformal Prediction. (arXiv:2307.04993v1 [astro-ph.CO])
    Precise measurements of the black hole mass are essential to gain insight on the black hole and host galaxy co-evolution. A direct measure of the black hole mass is often restricted to nearest galaxies and instead, an indirect method using the single-epoch virial black hole mass estimation is used for objects at high redshifts. However, this method is subjected to biases and uncertainties as it is reliant on the scaling relation from a small sample of local active galactic nuclei. In this study, we propose the application of conformalised quantile regression (CQR) to quantify the uncertainties of the black hole predictions in a machine learning setting. We compare CQR with various prediction interval techniques and demonstrated that CQR can provide a more useful prediction interval indicator. In contrast to baseline approaches for prediction interval estimation, we show that the CQR method provides prediction intervals that adjust to the black hole mass and its related properties. That is it yields a tighter constraint on the prediction interval (hence more certain) for a larger black hole mass, and accordingly, bright and broad spectral line width source. Using a combination of neural network model and CQR framework, the recovered virial black hole mass predictions and uncertainties are comparable to those measured from the Sloan Digital Sky Survey. The code is publicly available at https://github.com/yongsukyee/uncertain_blackholemass.
    CareFall: Automatic Fall Detection through Wearable Devices and AI Methods. (arXiv:2307.05275v1 [cs.LG])
    The aging population has led to a growing number of falls in our society, affecting global public health worldwide. This paper presents CareFall, an automatic Fall Detection System (FDS) based on wearable devices and Artificial Intelligence (AI) methods. CareFall considers the accelerometer and gyroscope time signals extracted from a smartwatch. Two different approaches are used for feature extraction and classification: i) threshold-based, and ii) machine learning-based. Experimental results on two public databases show that the machine learning-based approach, which combines accelerometer and gyroscope information, outperforms the threshold-based approach in terms of accuracy, sensitivity, and specificity. This research contributes to the design of smart and user-friendly solutions to mitigate the negative consequences of falls among older people.
    Fast dynamic time warping and clustering in C++. (arXiv:2307.04904v1 [eess.SP])
    We present an approach for computationally efficient dynamic time warping (DTW) and clustering of time-series data. The method frames the dynamic warping of time series datasets as an optimisation problem solved using dynamic programming, and then clusters time series data by solving a second optimisation problem using mixed-integer programming (MIP). There is also an option to use k-medoids clustering for increased speed, when a certificate for global optimality is not essential. The improved efficiency of our approach is due to task-level parallelisation of the clustering alongside DTW. Our approach was tested using the UCR Time Series Archive, and was found to be, on average, 33% faster than the next fastest option when using the same clustering method. This increases to 64% faster when considering only larger datasets (with more than 1000 time series). The MIP clustering is most effective on small numbers of longer time series, because the DTW computation is faster than other approaches, but the clustering problem becomes increasingly computationally expensive as the number of time series to be clustered increases.
    Robust Inference of Manifold Density and Geometry by Doubly Stochastic Scaling. (arXiv:2209.08004v2 [math.ST] UPDATED)
    The Gaussian kernel and its traditional normalizations (e.g., row-stochastic) are popular approaches for assessing similarities between data points. Yet, they can be inaccurate under high-dimensional noise, especially if the noise magnitude varies considerably across the data, e.g., under heteroskedasticity or outliers. In this work, we investigate a more robust alternative -- the doubly stochastic normalization of the Gaussian kernel. We consider a setting where points are sampled from an unknown density on a low-dimensional manifold embedded in high-dimensional space and corrupted by possibly strong, non-identically distributed, sub-Gaussian noise. We establish that the doubly stochastic affinity matrix and its scaling factors concentrate around certain population forms, and provide corresponding finite-sample probabilistic error bounds. We then utilize these results to develop several tools for robust inference under general high-dimensional noise. First, we derive a robust density estimator that reliably infers the underlying sampling density and can substantially outperform the standard kernel density estimator under heteroskedasticity and outliers. Second, we obtain estimators for the pointwise noise magnitudes, the pointwise signal magnitudes, and the pairwise Euclidean distances between clean data points. Lastly, we derive robust graph Laplacian normalizations that accurately approximate various manifold Laplacians, including the Laplace Beltrami operator, improving over traditional normalizations in noisy settings. We exemplify our results in simulations and on real single-cell RNA-sequencing data. For the latter, we show that in contrast to traditional methods, our approach is robust to variability in technical noise levels across cell types.
    Score Function Gradient Estimation to Widen the Applicability of Decision-Focused Learning. (arXiv:2307.05213v1 [cs.LG])
    Many real-world optimization problems contain unknown parameters that must be predicted prior to solving. To train the predictive machine learning (ML) models involved, the commonly adopted approach focuses on maximizing predictive accuracy. However, this approach does not always lead to the minimization of the downstream task loss. Decision-focused learning (DFL) is a recently proposed paradigm whose goal is to train the ML model by directly minimizing the task loss. However, state-of-the-art DFL methods are limited by the assumptions they make about the structure of the optimization problem (e.g., that the problem is linear) and by the fact that can only predict parameters that appear in the objective function. In this work, we address these limitations by instead predicting \textit{distributions} over parameters and adopting score function gradient estimation (SFGE) to compute decision-focused updates to the predictive model, thereby widening the applicability of DFL. Our experiments show that by using SFGE we can: (1) deal with predictions that occur both in the objective function and in the constraints; and (2) effectively tackle two-stage stochastic optimization problems.
    FairLay-ML: Intuitive Remedies for Unfairness in Data-Driven Social-Critical Algorithms. (arXiv:2307.05029v1 [cs.LG])
    This thesis explores open-sourced machine learning (ML) model explanation tools to understand whether these tools can allow a layman to visualize, understand, and suggest intuitive remedies to unfairness in ML-based decision-support systems. Machine learning models trained on datasets biased against minority groups are increasingly used to guide life-altering social decisions, prompting the urgent need to study their logic for unfairness. Due to this problem's impact on vast populations of the general public, it is critical for the layperson -- not just subject matter experts in social justice or machine learning experts -- to understand the nature of unfairness within these algorithms and the potential trade-offs. Existing research on fairness in machine learning focuses mostly on the mathematical definitions and tools to understand and remedy unfair models, with some directly citing user-interactive tools as necessary for future work. This thesis presents FairLay-ML, a proof-of-concept GUI integrating some of the most promising tools to provide intuitive explanations for unfair logic in ML models by integrating existing research tools (e.g. Local Interpretable Model-Agnostic Explanations) with existing ML-focused GUI (e.g. Python Streamlit). We test FairLay-ML using models of various accuracy and fairness generated by an unfairness detector tool, Parfait-ML, and validate our results using Themis. Our study finds that the technology stack used for FairLay-ML makes it easy to install and provides real-time black-box explanations of pre-trained models to users. Furthermore, the explanations provided translate to actionable remedies.
    M$^2$Hub: Unlocking the Potential of Machine Learning for Materials Discovery. (arXiv:2307.05378v1 [cond-mat.mtrl-sci])
    We introduce M$^2$Hub, a toolkit for advancing machine learning in materials discovery. Machine learning has achieved remarkable progress in modeling molecular structures, especially biomolecules for drug discovery. However, the development of machine learning approaches for modeling materials structures lag behind, which is partly due to the lack of an integrated platform that enables access to diverse tasks for materials discovery. To bridge this gap, M$^2$Hub will enable easy access to materials discovery tasks, datasets, machine learning methods, evaluations, and benchmark results that cover the entire workflow. Specifically, the first release of M$^2$Hub focuses on three key stages in materials discovery: virtual screening, inverse design, and molecular simulation, including 9 datasets that covers 6 types of materials with 56 tasks across 8 types of material properties. We further provide 2 synthetic datasets for the purpose of generative tasks on materials. In addition to random data splits, we also provide 3 additional data partitions to reflect the real-world materials discovery scenarios. State-of-the-art machine learning methods (including those are suitable for materials structures but never compared in the literature) are benchmarked on representative tasks. Our codes and library are publicly available at https://github.com/yuanqidu/M2Hub.
    Conformalization of Sparse Generalized Linear Models. (arXiv:2307.05109v1 [cs.LG])
    Given a sequence of observable variables $\{(x_1, y_1), \ldots, (x_n, y_n)\}$, the conformal prediction method estimates a confidence set for $y_{n+1}$ given $x_{n+1}$ that is valid for any finite sample size by merely assuming that the joint distribution of the data is permutation invariant. Although attractive, computing such a set is computationally infeasible in most regression problems. Indeed, in these cases, the unknown variable $y_{n+1}$ can take an infinite number of possible candidate values, and generating conformal sets requires retraining a predictive model for each candidate. In this paper, we focus on a sparse linear model with only a subset of variables for prediction and use numerical continuation techniques to approximate the solution path efficiently. The critical property we exploit is that the set of selected variables is invariant under a small perturbation of the input data. Therefore, it is sufficient to enumerate and refit the model only at the change points of the set of active features and smoothly interpolate the rest of the solution via a Predictor-Corrector mechanism. We show how our path-following algorithm accurately approximates conformal prediction sets and illustrate its performance using synthetic and real data examples.
    SuryaKiran at MEDIQA-Sum 2023: Leveraging LoRA for Clinical Dialogue Summarization. (arXiv:2307.05162v1 [cs.CL])
    Finetuning Large Language Models helps improve the results for domain-specific use cases. End-to-end finetuning of large language models is time and resource intensive and has high storage requirements to store the finetuned version of the large language model. Parameter Efficient Fine Tuning (PEFT) methods address the time and resource challenges by keeping the large language model as a fixed base and add additional layers, which the PEFT methods finetune. This paper demonstrates the evaluation results for one such PEFT method Low Rank Adaptation (LoRA), for Clinical Dialogue Summarization. The evaluation results show that LoRA works at par with end-to-end finetuning for a large language model. The paper presents the evaluations done for solving both the Subtask A and B from ImageCLEFmedical {https://www.imageclef.org/2023/medical}
    Portfolio Optimization: A Comparative Study. (arXiv:2307.05048v1 [q-fin.PM])
    Portfolio optimization has been an area that has attracted considerable attention from the financial research community. Designing a profitable portfolio is a challenging task involving precise forecasting of future stock returns and risks. This chapter presents a comparative study of three portfolio design approaches, the mean-variance portfolio (MVP), hierarchical risk parity (HRP)-based portfolio, and autoencoder-based portfolio. These three approaches to portfolio design are applied to the historical prices of stocks chosen from ten thematic sectors listed on the National Stock Exchange (NSE) of India. The portfolios are designed using the stock price data from January 1, 2018, to December 31, 2021, and their performances are tested on the out-of-sample data from January 1, 2022, to December 31, 2022. Extensive results are analyzed on the performance of the portfolios. It is observed that the performance of the MVP portfolio is the best on the out-of-sample data for the risk-adjusted returns. However, the autoencoder portfolios outperformed their counterparts on annual returns.
    Differentially Private Statistical Inference through $\beta$-Divergence One Posterior Sampling. (arXiv:2307.05194v1 [stat.ML])
    Differential privacy guarantees allow the results of a statistical analysis involving sensitive data to be released without compromising the privacy of any individual taking part. Achieving such guarantees generally requires the injection of noise, either directly into parameter estimates or into the estimation process. Instead of artificially introducing perturbations, sampling from Bayesian posterior distributions has been shown to be a special case of the exponential mechanism, producing consistent, and efficient private estimates without altering the data generative process. The application of current approaches has, however, been limited by their strong bounding assumptions which do not hold for basic models, such as simple linear regressors. To ameliorate this, we propose $\beta$D-Bayes, a posterior sampling scheme from a generalised posterior targeting the minimisation of the $\beta$-divergence between the model and the data generating process. This provides private estimation that is generally applicable without requiring changes to the underlying model and consistently learns the data generating parameter. We show that $\beta$D-Bayes produces more precise inference estimation for the same privacy guarantees, and further facilitates differentially private estimation via posterior sampling for complex classifiers and continuous regression models such as neural networks for the first time.
    A Mapping Study of Machine Learning Methods for Remaining Useful Life Estimation of Lead-Acid Batteries. (arXiv:2307.05163v1 [cs.LG])
    Energy storage solutions play an increasingly important role in modern infrastructure and lead-acid batteries are among the most commonly used in the rechargeable category. Due to normal degradation over time, correctly determining the battery's State of Health (SoH) and Remaining Useful Life (RUL) contributes to enhancing predictive maintenance, reliability, and longevity of battery systems. Besides improving the cost savings, correct estimation of the SoH can lead to reduced pollution though reuse of retired batteries. This paper presents a mapping study of the state-of-the-art in machine learning methods for estimating the SoH and RUL of lead-acid batteries. These two indicators are critical in the battery management systems of electric vehicles, renewable energy systems, and other applications that rely heavily on this battery technology. In this study, we analyzed the types of machine learning algorithms employed for estimating SoH and RUL, and evaluated their performance in terms of accuracy and inference time. Additionally, this mapping identifies and analyzes the most commonly used combinations of sensors in specific applications, such as vehicular batteries. The mapping concludes by highlighting potential gaps and opportunities for future research, which lays the foundation for further advancements in the field.
    Tracking Most Significant Shifts in Nonparametric Contextual Bandits. (arXiv:2307.05341v1 [stat.ML])
    We study nonparametric contextual bandits where Lipschitz mean reward functions may change over time. We first establish the minimax dynamic regret rate in this less understood setting in terms of number of changes $L$ and total-variation $V$, both capturing all changes in distribution over context space, and argue that state-of-the-art procedures are suboptimal in this setting. Next, we tend to the question of an adaptivity for this setting, i.e. achieving the minimax rate without knowledge of $L$ or $V$. Quite importantly, we posit that the bandit problem, viewed locally at a given context $X_t$, should not be affected by reward changes in other parts of context space $\cal X$. We therefore propose a notion of change, which we term experienced significant shifts, that better accounts for locality, and thus counts considerably less changes than $L$ and $V$. Furthermore, similar to recent work on non-stationary MAB (Suk & Kpotufe, 2022), experienced significant shifts only count the most significant changes in mean rewards, e.g., severe best-arm changes relevant to observed contexts. Our main result is to show that this more tolerant notion of change can in fact be adapted to.
    Distributed Pruning Towards Tiny Neural Networks in Federated Learning. (arXiv:2212.01977v2 [cs.LG] UPDATED)
    Neural network pruning is an essential technique for reducing the size and complexity of deep neural networks, enabling large-scale models on devices with limited resources. However, existing pruning approaches heavily rely on training data for guiding the pruning strategies, making them ineffective for federated learning over distributed and confidential datasets. Additionally, the memory- and computation-intensive pruning process becomes infeasible for recourse-constrained devices in federated learning. To address these challenges, we propose FedTiny, a distributed pruning framework for federated learning that generates specialized tiny models for memory- and computing-constrained devices. We introduce two key modules in FedTiny to adaptively search coarse- and finer-pruned specialized models to fit deployment scenarios with sparse and cheap local computation. First, an adaptive batch normalization selection module is designed to mitigate biases in pruning caused by the heterogeneity of local data. Second, a lightweight progressive pruning module aims to finer prune the models under strict memory and computational budgets, allowing the pruning policy for each layer to be gradually determined rather than evaluating the overall model structure. The experimental results demonstrate the effectiveness of FedTiny, which outperforms state-of-the-art approaches, particularly when compressing deep models to extremely sparse tiny models. FedTiny achieves an accuracy improvement of 2.61% while significantly reducing the computational cost by 95.91% and the memory footprint by 94.01% compared to state-of-the-art methods.
    Improving Image-Based Precision Medicine with Uncertainty-Aware Causal Models. (arXiv:2305.03829v2 [cs.LG] UPDATED)
    Image-based precision medicine aims to personalize treatment decisions based on an individual's unique imaging features so as to improve their clinical outcome. Machine learning frameworks that integrate uncertainty estimation as part of their treatment recommendations would be safer and more reliable. However, little work has been done in adapting uncertainty estimation techniques and validation metrics for precision medicine. In this paper, we use Bayesian deep learning for estimating the posterior distribution over factual and counterfactual outcomes on several treatments. This allows for estimating the uncertainty for each treatment option and for the individual treatment effects (ITE) between any two treatments. We train and evaluate this model to predict future new and enlarging T2 lesion counts on a large, multi-center dataset of MR brain images of patients with multiple sclerosis, exposed to several treatments during randomized controlled trials. We evaluate the correlation of the uncertainty estimate with the factual error, and, given the lack of ground truth counterfactual outcomes, demonstrate how uncertainty for the ITE prediction relates to bounds on the ITE error. Lastly, we demonstrate how knowledge of uncertainty could modify clinical decision-making to improve individual patient and clinical trial outcomes.
    Deep Probabilistic Movement Primitives with a Bayesian Aggregator. (arXiv:2307.05141v1 [cs.RO])
    Movement primitives are trainable parametric models that reproduce robotic movements starting from a limited set of demonstrations. Previous works proposed simple linear models that exhibited high sample efficiency and generalization power by allowing temporal modulation of movements (reproducing movements faster or slower), blending (merging two movements into one), via-point conditioning (constraining a movement to meet some particular via-points) and context conditioning (generation of movements based on an observed variable, e.g., position of an object). Previous works have proposed neural network-based motor primitive models, having demonstrated their capacity to perform tasks with some forms of input conditioning or time-modulation representations. However, there has not been a single unified deep motor primitive's model proposed that is capable of all previous operations, limiting neural motor primitive's potential applications. This paper proposes a deep movement primitive architecture that encodes all the operations above and uses a Bayesian context aggregator that allows a more sound context conditioning and blending. Our results demonstrate our approach can scale to reproduce complex motions on a larger variety of input choices compared to baselines while maintaining operations of linear movement primitives provide.
    Probabilistic Counterexample Guidance for Safer Reinforcement Learning. (arXiv:2307.04927v1 [cs.LG])
    Safe exploration aims at addressing the limitations of Reinforcement Learning (RL) in safety-critical scenarios, where failures during trial-and-error learning may incur high costs. Several methods exist to incorporate external knowledge or to use proximal sensor data to limit the exploration of unsafe states. However, reducing exploration risks in unknown environments, where an agent must discover safety threats during exploration, remains challenging. In this paper, we target the problem of safe exploration by guiding the training with counterexamples of the safety requirement. Our method abstracts both continuous and discrete state-space systems into compact abstract models representing the safety-relevant knowledge acquired by the agent during exploration. We then exploit probabilistic counterexample generation to construct minimal simulation submodels eliciting safety requirement violations, where the agent can efficiently train offline to refine its policy towards minimising the risk of safety violations during the subsequent online exploration. We demonstrate our method's effectiveness in reducing safety violations during online exploration in preliminary experiments by an average of 40.3% compared with QL and DQN standard algorithms and 29.1% compared with previous related work, while achieving comparable cumulative rewards with respect to unrestricted exploration and alternative approaches.
    One-Versus-Others Attention: Scalable Multimodal Integration. (arXiv:2307.05435v1 [cs.LG])
    Multimodal learning models have become increasingly important as they surpass single-modality approaches on diverse tasks ranging from question-answering to autonomous driving. Despite the importance of multimodal learning, existing efforts focus on NLP applications, where the number of modalities is typically less than four (audio, video, text, images). However, data inputs in other domains, such as the medical field, may include X-rays, PET scans, MRIs, genetic screening, clinical notes, and more, creating a need for both efficient and accurate information fusion. Many state-of-the-art models rely on pairwise cross-modal attention, which does not scale well for applications with more than three modalities. For $n$ modalities, computing attention will result in $n \choose 2$ operations, potentially requiring considerable amounts of computational resources. To address this, we propose a new domain-neutral attention mechanism, One-Versus-Others (OvO) attention, that scales linearly with the number of modalities and requires only $n$ attention operations, thus offering a significant reduction in computational complexity compared to existing cross-modal attention algorithms. Using three diverse real-world datasets as well as an additional simulation experiment, we show that our method improves performance compared to popular fusion techniques while decreasing computation costs.
    Restoring the saturation response of a PMT using pulse-shape and artificial-neural-networks. (arXiv:2302.06170v3 [physics.ins-det] UPDATED)
    The linear response of a photomultiplier tube (PMT) is a required property for photon counting and reconstruction of the neutrino energy. The linearity valid region and the saturation response of PMT were investigated using a linear-alkyl-benzene (LAB)-based liquid scintillator. A correlation was observed between the two different saturation responses, with pulse-shape distortion and pulse-area decrease. The observed pulse-shape provides useful information for the estimation of the linearity region relative to the pulse-area. This correlation-based diagnosis allows an ${in}$-${situ}$ estimation of the linearity range, which was previously challenging. The measured correlation between the two saturation responses was employed to train an artificial-neural-network (ANN) to predict the decrease in pulse-area from the observed pulse-shape. The ANN-predicted pulse-area decrease enables the prediction of the ideal number of photoelectrons irrelevant to the saturation behavior. This pulse-shape-based machine learning technique offers a novel method for restoring the saturation response of PMTs.
    Hierarchical Classification of Research Fields in the "Web of Science" Using Deep Learning. (arXiv:2302.00390v2 [cs.DL] UPDATED)
    This paper presents a hierarchical classification system that automatically categorizes a scholarly publication using its abstract into a three-tier hierarchical label set (discipline, field, subfield) in a multi-class setting. This system enables a holistic categorization of research activities in the mentioned hierarchy in terms of knowledge production through articles and impact through citations, permitting those activities to fall into multiple categories. The classification system distinguishes 44 disciplines, 718 fields and 1,485 subfields among 160 million abstract snippets in Microsoft Academic Graph (version 2018-05-17). We used batch training in a modularized and distributed fashion to address and allow for interdisciplinary and interfield classifications in single-label and multi-label settings. In total, we have conducted 3,140 experiments in all considered models (Convolutional Neural Networks, Recurrent Neural Networks, Transformers). The classification accuracy is > 90% in 77.13% and 78.19% of the single-label and multi-label classifications, respectively. We examine the advantages of our classification by its ability to better align research texts and output with disciplines, to adequately classify them in an automated way, and to capture the degree of interdisciplinarity. The proposed system (a set of pre-trained models) can serve as a backbone to an interactive system for indexing scientific publications in the future.
    Simplicial Message Passing for Chemical Property Prediction. (arXiv:2307.05392v1 [cond-mat.mtrl-sci])
    Recently, message-passing Neural networks (MPNN) provide a promising tool for dealing with molecular graphs and have achieved remarkable success in facilitating the discovery and materials design with desired properties. However, the classical MPNN methods also suffer from a limitation in capturing the strong topological information hidden in molecular structures, such as nonisomorphic graphs. To address this problem, this work proposes a Simplicial Message Passing (SMP) framework to better capture the topological information from molecules, which can break through the limitation within the vanilla message-passing paradigm. In SMP, a generalized message-passing framework is established for aggregating the information from arbitrary-order simplicial complex, and a hierarchical structure is elaborated to allow information exchange between different order simplices. We apply the SMP framework within deep learning architectures for quantum-chemical properties prediction and achieve state-of-the-art results. The results show that compared to traditional MPNN, involving higher-order simplex can better capture the complex structure of molecules and substantially enhance the performance of tasks. The SMP-based model can provide a generalized framework for GNNs and aid in the discovery and design of materials with tailored properties for various applications.
    SleepEGAN: A GAN-enhanced Ensemble Deep Learning Model for Imbalanced Classification of Sleep Stages. (arXiv:2307.05362v1 [eess.SP])
    Deep neural networks have played an important role in automatic sleep stage classification because of their strong representation and in-model feature transformation abilities. However, class imbalance and individual heterogeneity which typically exist in raw EEG signals of sleep data can significantly affect the classification performance of any machine learning algorithms. To solve these two problems, this paper develops a generative adversarial network (GAN)-powered ensemble deep learning model, named SleepEGAN, for the imbalanced classification of sleep stages. To alleviate class imbalance, we propose a new GAN (called EGAN) architecture adapted to the features of EEG signals for data augmentation. The generated samples for the minority classes are used in the training process. In addition, we design a cost-free ensemble learning strategy to reduce the model estimation variance caused by the heterogeneity between the validation and test sets, so as to enhance the accuracy and robustness of prediction performance. We show that the proposed method can improve classification accuracy compared to several existing state-of-the-art methods using three public sleep datasets.
    Stochastic Nested Compositional Bi-level Optimization for Robust Feature Learning. (arXiv:2307.05384v1 [math.OC])
    We develop and analyze stochastic approximation algorithms for solving nested compositional bi-level optimization problems. These problems involve a nested composition of $T$ potentially non-convex smooth functions in the upper-level, and a smooth and strongly convex function in the lower-level. Our proposed algorithm does not rely on matrix inversions or mini-batches and can achieve an $\epsilon$-stationary solution with an oracle complexity of approximately $\tilde{O}_T(1/\epsilon^{2})$, assuming the availability of stochastic first-order oracles for the individual functions in the composition and the lower-level, which are unbiased and have bounded moments. Here, $\tilde{O}_T$ hides polylog factors and constants that depend on $T$. The key challenge we address in establishing this result relates to handling three distinct sources of bias in the stochastic gradients. The first source arises from the compositional nature of the upper-level, the second stems from the bi-level structure, and the third emerges due to the utilization of Neumann series approximations to avoid matrix inversion. To demonstrate the effectiveness of our approach, we apply it to the problem of robust feature learning for deep neural networks under covariate shift, showcasing the benefits and advantages of our methodology in that context.
    Choosing Well Your Opponents: How to Guide the Synthesis of Programmatic Strategies. (arXiv:2307.04893v1 [cs.LG])
    This paper introduces Local Learner (2L), an algorithm for providing a set of reference strategies to guide the search for programmatic strategies in two-player zero-sum games. Previous learning algorithms, such as Iterated Best Response (IBR), Fictitious Play (FP), and Double-Oracle (DO), can be computationally expensive or miss important information for guiding search algorithms. 2L actively selects a set of reference strategies to improve the search signal. We empirically demonstrate the advantages of our approach while guiding a local search algorithm for synthesizing strategies in three games, including MicroRTS, a challenging real-time strategy game. Results show that 2L learns reference strategies that provide a stronger search signal than IBR, FP, and DO. We also simulate a tournament of MicroRTS, where a synthesizer using 2L outperformed the winners of the two latest MicroRTS competitions, which were programmatic strategies written by human programmers.  ( 2 min )
    Optimized Crystallographic Graph Generation for Material Science. (arXiv:2307.05380v1 [cond-mat.mtrl-sci])
    Graph neural networks are widely used in machine learning applied to chemistry, and in particular for material science discovery. For crystalline materials, however, generating graph-based representation from geometrical information for neural networks is not a trivial task. The periodicity of crystalline needs efficient implementations to be processed in real-time under a massively parallel environment. With the aim of training graph-based generative models of new material discovery, we propose an efficient tool to generate cutoff graphs and k-nearest-neighbours graphs of periodic structures within GPU optimization. We provide pyMatGraph a Pytorch-compatible framework to generate graphs in real-time during the training of neural network architecture. Our tool can update a graph of a structure, making generative models able to update the geometry and process the updated graph during the forward propagation on the GPU side. Our code is publicly available at https://github.com/aklipf/mat-graph.  ( 2 min )
    Discovering Symbolic Laws Directly from Trajectories with Hamiltonian Graph Neural Networks. (arXiv:2307.05299v1 [cs.LG])
    The time evolution of physical systems is described by differential equations, which depend on abstract quantities like energy and force. Traditionally, these quantities are derived as functionals based on observables such as positions and velocities. Discovering these governing symbolic laws is the key to comprehending the interactions in nature. Here, we present a Hamiltonian graph neural network (HGNN), a physics-enforced GNN that learns the dynamics of systems directly from their trajectory. We demonstrate the performance of HGNN on n-springs, n-pendulums, gravitational systems, and binary Lennard Jones systems; HGNN learns the dynamics in excellent agreement with the ground truth from small amounts of data. We also evaluate the ability of HGNN to generalize to larger system sizes, and to hybrid spring-pendulum system that is a combination of two original systems (spring and pendulum) on which the models are trained independently. Finally, employing symbolic regression on the learned HGNN, we infer the underlying equations relating the energy functionals, even for complex systems such as the binary Lennard-Jones liquid. Our framework facilitates the interpretable discovery of interaction laws directly from physical system trajectories. Furthermore, this approach can be extended to other systems with topology-dependent dynamics, such as cells, polydisperse gels, or deformable bodies.  ( 2 min )
    Automated Detection of Gait Events and Travel Distance Using Waist-worn Accelerometers Across a Typical Range of Walking and Running Speeds. (arXiv:2307.04866v1 [eess.SP])
    Background: Estimation of temporospatial clinical features of gait (CFs), such as step count and length, step duration, step frequency, gait speed and distance traveled is an important component of community-based mobility evaluation using wearable accelerometers. However, challenges arising from device complexity and availability, cost and analytical methodology have limited widespread application of such tools. Research Question: Can accelerometer data from commercially-available smartphones be used to extract gait CFs across a broad range of attainable gait velocities in children with Duchenne muscular dystrophy (DMD) and typically developing controls (TDs) using machine learning (ML)-based methods Methods: Fifteen children with DMD and 15 TDs underwent supervised clinical testing across a range of gait speeds using 10 or 25m run/walk (10MRW, 25MRW), 100m run/walk (100MRW), 6-minute walk (6MWT) and free-walk (FW) evaluations while wearing a mobile phone-based accelerometer at the waist near the body's center of mass. Gait CFs were extracted from the accelerometer data using a multi-step machine learning-based process and results were compared to ground-truth observation data. Results: Model predictions vs. observed values for step counts, distance traveled, and step length showed a strong correlation (Pearson's r = -0.9929 to 0.9986, p<0.0001). The estimates demonstrated a mean (SD) percentage error of 1.49% (7.04%) for step counts, 1.18% (9.91%) for distance traveled, and 0.37% (7.52%) for step length compared to ground truth observations for the combined 6MWT, 100MRW, and FW tasks. Significance: The study findings indicate that a single accelerometer placed near the body's center of mass can accurately measure CFs across different gait speeds in both TD and DMD peers, suggesting that there is potential for accurately measuring CFs in the community with consumer-level smartphones.  ( 3 min )
    Articulated 3D Head Avatar Generation using Text-to-Image Diffusion Models. (arXiv:2307.04859v1 [cs.CV])
    The ability to generate diverse 3D articulated head avatars is vital to a plethora of applications, including augmented reality, cinematography, and education. Recent work on text-guided 3D object generation has shown great promise in addressing these needs. These methods directly leverage pre-trained 2D text-to-image diffusion models to generate 3D-multi-view-consistent radiance fields of generic objects. However, due to the lack of geometry and texture priors, these methods have limited control over the generated 3D objects, making it difficult to operate inside a specific domain, e.g., human heads. In this work, we develop a new approach to text-guided 3D head avatar generation to address this limitation. Our framework directly operates on the geometry and texture of an articulable 3D morphable model (3DMM) of a head, and introduces novel optimization procedures to update the geometry and texture while keeping the 2D and 3D facial features aligned. The result is a 3D head avatar that is consistent with the text description and can be readily articulated using the deformation model of the 3DMM. We show that our diffusion-based articulated head avatars outperform state-of-the-art approaches for this task. The latter are typically based on CLIP, which is known to provide limited diversity of generation and accuracy for 3D object generation.  ( 2 min )
    Realising Synthetic Active Inference Agents, Part II: Variational Message Updates. (arXiv:2306.02733v2 [stat.ML] UPDATED)
    The Free Energy Principle (FEP) describes (biological) agents as minimising a variational Free Energy (FE) with respect to a generative model of their environment. Active Inference (AIF) is a corollary of the FEP that describes how agents explore and exploit their environment by minimising an expected FE objective. In two related papers, we describe a scalable, epistemic approach to synthetic AIF agents, by message passing on free-form Forney-style Factor Graphs (FFGs). A companion paper (part I) introduces a Constrained FFG (CFFG) notation that visually represents (generalised) FE objectives for AIF. The current paper (part II) derives message passing algorithms that minimise (generalised) FE objectives on a CFFG by variational calculus. A comparison between simulated Bethe and generalised FE agents illustrates how synthetic AIF induces epistemic behaviour on a T-maze navigation task. With a full message passing account of synthetic AIF agents, it becomes possible to derive and reuse message updates across models and move closer to industrial applications of synthetic AIF.  ( 2 min )
    DyCL: Dynamic Neural Network Compilation Via Program Rewriting and Graph Optimization. (arXiv:2307.04963v1 [cs.CL])
    DL compiler's primary function is to translate DNN programs written in high-level DL frameworks such as PyTorch and TensorFlow into portable executables. These executables can then be flexibly executed by the deployed host programs. However, existing DL compilers rely on a tracing mechanism, which involves feeding a runtime input to a neural network program and tracing the program execution paths to generate the computational graph necessary for compilation. Unfortunately, this mechanism falls short when dealing with modern dynamic neural networks (DyNNs) that possess varying computational graphs depending on the inputs. Consequently, conventional DL compilers struggle to accurately compile DyNNs into executable code. To address this limitation, we propose \tool, a general approach that enables any existing DL compiler to successfully compile DyNNs. \tool tackles the dynamic nature of DyNNs by introducing a compilation mechanism that redistributes the control and data flow of the original DNN programs during the compilation process. Specifically, \tool develops program analysis and program transformation techniques to convert a dynamic neural network into multiple sub-neural networks. Each sub-neural network is devoid of conditional statements and is compiled independently. Furthermore, \tool synthesizes a host module that models the control flow of the DyNNs and facilitates the invocation of the sub-neural networks. Our evaluation demonstrates the effectiveness of \tool, achieving a 100\% success rate in compiling all dynamic neural networks. Moreover, the compiled executables generated by \tool exhibit significantly improved performance, running between $1.12\times$ and $20.21\times$ faster than the original DyNNs executed on general-purpose DL frameworks.  ( 3 min )
    Perturbed-History Exploration in Stochastic Linear Bandits. (arXiv:1903.09132v2 [cs.LG] UPDATED)
    We propose a new online algorithm for cumulative regret minimization in a stochastic linear bandit. The algorithm pulls the arm with the highest estimated reward in a linear model trained on its perturbed history. Therefore, we call it perturbed-history exploration in a linear bandit (LinPHE). The perturbed history is a mixture of observed rewards and randomly generated i.i.d. pseudo-rewards. We derive a $\tilde{O}(d \sqrt{n})$ gap-free bound on the $n$-round regret of LinPHE, where $d$ is the number of features. The key steps in our analysis are new concentration and anti-concentration bounds on the weighted sum of Bernoulli random variables. To show the generality of our design, we generalize LinPHE to a logistic model. We evaluate our algorithms empirically and show that they are practical.  ( 2 min )
    Test-Time Training on Video Streams. (arXiv:2307.05014v1 [cs.CV])
    Prior work has established test-time training (TTT) as a general framework to further improve a trained model at test time. Before making a prediction on each test instance, the model is trained on the same instance using a self-supervised task, such as image reconstruction with masked autoencoders. We extend TTT to the streaming setting, where multiple test instances - video frames in our case - arrive in temporal order. Our extension is online TTT: The current model is initialized from the previous model, then trained on the current frame and a small window of frames immediately before. Online TTT significantly outperforms the fixed-model baseline for four tasks, on three real-world datasets. The relative improvement is 45% and 66% for instance and panoptic segmentation. Surprisingly, online TTT also outperforms its offline variant that accesses more information, training on all frames from the entire test video regardless of temporal order. This differs from previous findings using synthetic videos. We conceptualize locality as the advantage of online over offline TTT. We analyze the role of locality with ablations and a theory based on bias-variance trade-off.  ( 2 min )
    Accelerated Discovery of Machine-Learned Symmetries: Deriving the Exceptional Lie Groups G2, F4 and E6. (arXiv:2307.04891v1 [hep-th])
    Recent work has applied supervised deep learning to derive continuous symmetry transformations that preserve the data labels and to obtain the corresponding algebras of symmetry generators. This letter introduces two improved algorithms that significantly speed up the discovery of these symmetry transformations. The new methods are demonstrated by deriving the complete set of generators for the unitary groups U(n) and the exceptional Lie groups $G_2$, $F_4$, and $E_6$. A third post-processing algorithm renders the found generators in sparse form. We benchmark the performance improvement of the new algorithms relative to the standard approach. Given the significant complexity of the exceptional Lie groups, our results demonstrate that this machine-learning method for discovering symmetries is completely general and can be applied to a wide variety of labeled datasets.  ( 2 min )
    Monotone deep Boltzmann machines. (arXiv:2307.04990v1 [cs.LG])
    Deep Boltzmann machines (DBMs), one of the first ``deep'' learning methods ever studied, are multi-layered probabilistic models governed by a pairwise energy function that describes the likelihood of all variables/nodes in the network. In practice, DBMs are often constrained, i.e., via the \emph{restricted} Boltzmann machine (RBM) architecture (which does not permit intra-layer connections), in order to allow for more efficient inference. In this work, we revisit the generic DBM approach, and ask the question: are there other possible restrictions to their design that would enable efficient (approximate) inference? In particular, we develop a new class of restricted model, the monotone DBM, which allows for arbitrary self-connection in each layer, but restricts the \emph{weights} in a manner that guarantees the existence and global uniqueness of a mean-field fixed point. To do this, we leverage tools from the recently-proposed monotone Deep Equilibrium model and show that a particular choice of activation results in a fixed-point iteration that gives a variational mean-field solution. While this approach is still largely conceptual, it is the first architecture that allows for efficient approximate inference in fully-general weight structures for DBMs. We apply this approach to simple deep convolutional Boltzmann architectures and demonstrate that it allows for tasks such as the joint completion and classification of images, within a single deep probabilistic setting, while avoiding the pitfalls of mean-field inference in traditional RBMs.  ( 2 min )
    Human Emotion Recognition Based On Galvanic Skin Response signal Feature Selection and SVM. (arXiv:2307.05383v1 [eess.SP])
    A novel human emotion recognition method based on automatically selected Galvanic Skin Response (GSR) signal features and SVM is proposed in this paper. GSR signals were acquired by e-Health Sensor Platform V2.0. Then, the data is de-noised by wavelet function and normalized to get rid of the individual difference. 30 features are extracted from the normalized data, however, directly using of these features will lead to a low recognition rate. In order to gain the optimized features, a covariance based feature selection is employed in our method. Finally, a SVM with input of the optimized features is utilized to achieve the human emotion recognition. The experimental results indicate that the proposed method leads to good human emotion recognition, and the recognition accuracy is more than 66.67%.  ( 2 min )
    A physics-constrained machine learning method for mapping gapless land surface temperature. (arXiv:2307.04817v1 [physics.ao-ph])
    More accurate, spatio-temporally, and physically consistent LST estimation has been a main interest in Earth system research. Developing physics-driven mechanism models and data-driven machine learning (ML) models are two major paradigms for gapless LST estimation, which have their respective advantages and disadvantages. In this paper, a physics-constrained ML model, which combines the strengths in the mechanism model and ML model, is proposed to generate gapless LST with physical meanings and high accuracy. The hybrid model employs ML as the primary architecture, under which the input variable physical constraints are incorporated to enhance the interpretability and extrapolation ability of the model. Specifically, the light gradient-boosting machine (LGBM) model, which uses only remote sensing data as input, serves as the pure ML model. Physical constraints (PCs) are coupled by further incorporating key Community Land Model (CLM) forcing data (cause) and CLM simulation data (effect) as inputs into the LGBM model. This integration forms the PC-LGBM model, which incorporates surface energy balance (SEB) constraints underlying the data in CLM-LST modeling within a biophysical framework. Compared with a pure physical method and pure ML methods, the PC-LGBM model improves the prediction accuracy and physical interpretability of LST. It also demonstrates a good extrapolation ability for the responses to extreme weather cases, suggesting that the PC-LGBM model enables not only empirical learning from data but also rationally derived from theory. The proposed method represents an innovative way to map accurate and physically interpretable gapless LST, and could provide insights to accelerate knowledge discovery in land surface processes and data mining in geographical parameter estimation.  ( 3 min )
    Predicting Outcomes in Long COVID Patients with Spatiotemporal Attention. (arXiv:2307.04770v1 [cs.LG])
    Long COVID is a general term of post-acute sequelae of COVID-19. Patients with long COVID can endure long-lasting symptoms including fatigue, headache, dyspnea and anosmia, etc. Identifying the cohorts with severe long-term complications in COVID-19 could benefit the treatment planning and resource arrangement. However, due to the heterogeneous phenotype presented in long COVID patients, it is difficult to predict their outcomes from their longitudinal data. In this study, we proposed a spatiotemporal attention mechanism to weigh feature importance jointly from the temporal dimension and feature space. Considering that medical examinations can have interchangeable orders in adjacent time points, we restricted the learning of short-term dependency with a Local-LSTM and the learning of long-term dependency with the joint spatiotemporal attention. We also compared the proposed method with several state-of-the-art methods and a method in clinical practice. The methods are evaluated on a hard-to-acquire clinical dataset of patients with long COVID. Experimental results show the Local-LSTM with joint spatiotemporal attention outperformed related methods in outcome prediction. The proposed method provides a clinical tool for the severity assessment of long COVID.  ( 2 min )
    Collaborative Score Distillation for Consistent Visual Synthesis. (arXiv:2307.04787v1 [cs.CV])
    Generative priors of large-scale text-to-image diffusion models enable a wide range of new generation and editing applications on diverse visual modalities. However, when adapting these priors to complex visual modalities, often represented as multiple images (e.g., video), achieving consistency across a set of images is challenging. In this paper, we address this challenge with a novel method, Collaborative Score Distillation (CSD). CSD is based on the Stein Variational Gradient Descent (SVGD). Specifically, we propose to consider multiple samples as "particles" in the SVGD update and combine their score functions to distill generative priors over a set of images synchronously. Thus, CSD facilitates seamless integration of information across 2D images, leading to a consistent visual synthesis across multiple samples. We show the effectiveness of CSD in a variety of tasks, encompassing the visual editing of panorama images, videos, and 3D scenes. Our results underline the competency of CSD as a versatile method for enhancing inter-sample consistency, thereby broadening the applicability of text-to-image diffusion models.  ( 2 min )
    On Detecting Some Defective Items in Group Testing. (arXiv:2307.04822v1 [cs.DS])
    Group testing is an approach aimed at identifying up to $d$ defective items among a total of $n$ elements. This is accomplished by examining subsets to determine if at least one defective item is present. In our study, we focus on the problem of identifying a subset of $\ell\leq d$ defective items. We develop upper and lower bounds on the number of tests required to detect $\ell$ defective items in both the adaptive and non-adaptive settings while considering scenarios where no prior knowledge of $d$ is available, and situations where an estimate of $d$ or at least some non-trivial upper bound on $d$ is available. When no prior knowledge on $d$ is available, we prove a lower bound of $ \Omega(\frac{\ell \log^2n}{\log \ell +\log\log n})$ tests in the randomized non-adaptive settings and an upper bound of $O(\ell \log^2 n)$ for the same settings. Furthermore, we demonstrate that any non-adaptive deterministic algorithm must ask $\Theta(n)$ tests, signifying a fundamental limitation in this scenario. For adaptive algorithms, we establish tight bounds in different scenarios. In the deterministic case, we prove a tight bound of $\Theta(\ell\log{(n/\ell)})$. Moreover, in the randomized settings, we derive a tight bound of $\Theta(\ell\log{(n/d)})$. When $d$, or at least some non-trivial estimate of $d$, is known, we prove a tight bound of $\Theta(d\log (n/d))$ for the deterministic non-adaptive settings, and $\Theta(\ell\log(n/d))$ for the randomized non-adaptive settings. In the adaptive case, we present an upper bound of $O(\ell \log (n/\ell))$ for the deterministic settings, and a lower bound of $\Omega(\ell\log(n/d)+\log n)$. Additionally, we establish a tight bound of $\Theta(\ell \log(n/d))$ for the randomized adaptive settings.  ( 3 min )
    Comparison of Point Cloud and Image-based Models for Calorimeter Fast Simulation. (arXiv:2307.04780v1 [cs.LG])
    Score based generative models are a new class of generative models that have been shown to accurately generate high dimensional calorimeter datasets. Recent advances in generative models have used images with 3D voxels to represent and model complex calorimeter showers. Point clouds, however, are likely a more natural representation of calorimeter showers, particularly in calorimeters with high granularity. Point clouds preserve all of the information of the original simulation, more naturally deal with sparse datasets, and can be implemented with more compact models and data files. In this work, two state-of-the-art score based models are trained on the same set of calorimeter simulation and directly compared.  ( 2 min )
    Formulating A Strategic Plan Based On Statistical Analyses And Applications For Financial Companies Through A Real-World Use Case. (arXiv:2307.04778v1 [cs.LG])
    Business statistics play a crucial role in implementing a data-driven strategic plan at the enterprise level to employ various analytics where the outcomes of such a plan enable an enterprise to enhance the decision-making process or to mitigate risks to the organization. In this work, a strategic plan informed by the statistical analysis is introduced for a financial company called LendingClub, where the plan is comprised of exploring the possibility of onboarding a big data platform along with advanced feature selection capacities. The main objectives of such a plan are to increase the company's revenue while reducing the risks of granting loans to borrowers who cannot return their loans. In this study, different hypotheses formulated to address the company's concerns are studied, where the results reveal that the amount of loans profoundly impacts the number of borrowers charging off their loans. Also, the proposed strategic plan includes onboarding advanced analytics such as machine learning technologies that allow the company to build better generalized data-driven predictive models.  ( 2 min )
    MentalHealthAI: Utilizing Personal Health Device Data to Optimize Psychiatry Treatment. (arXiv:2307.04777v1 [cs.LG])
    Mental health disorders remain a significant challenge in modern healthcare, with diagnosis and treatment often relying on subjective patient descriptions and past medical history. To address this issue, we propose a personalized mental health tracking and mood prediction system that utilizes patient physiological data collected through personal health devices. Our system leverages a decentralized learning mechanism that combines transfer and federated machine learning concepts using smart contracts, allowing data to remain on users' devices and enabling effective tracking of mental health conditions for psychiatric treatment and management in a privacy-aware and accountable manner. We evaluate our model using a popular mental health dataset that demonstrates promising results. By utilizing connected health systems and machine learning models, our approach offers a novel solution to the challenge of providing psychiatrists with further insight into their patients' mental health outside of traditional office visits.  ( 2 min )
    Digital Twins for Patient Care via Knowledge Graphs and Closed-Form Continuous-Time Liquid Neural Networks. (arXiv:2307.04772v1 [cs.LG])
    Digital twin technology has is anticipated to transform healthcare, enabling personalized medicines and support, earlier diagnoses, simulated treatment outcomes, and optimized surgical plans. Digital twins are readily gaining traction in industries like manufacturing, supply chain logistics, and civil infrastructure. Not in patient care, however. The challenge of modeling complex diseases with multimodal patient data and the computational complexities of analyzing it have stifled digital twin adoption in the biomedical vertical. Yet, these major obstacles can potentially be handled by approaching these models in a different way. This paper proposes a novel framework for addressing the barriers to clinical twin modeling created by computational costs and modeling complexities. We propose structuring patient health data as a knowledge graph and using closed-form continuous-time liquid neural networks, for real-time analytics. By synthesizing multimodal patient data and leveraging the flexibility and efficiency of closed form continuous time networks and knowledge graph ontologies, our approach enables real time insights, personalized medicine, early diagnosis and intervention, and optimal surgical planning. This novel approach provides a comprehensive and adaptable view of patient health along with real-time analytics, paving the way for digital twin simulations and other anticipated benefits in healthcare.  ( 2 min )
  • Open

    Optimal Algorithms for Latent Bandits with Cluster Structure. (arXiv:2301.07040v3 [cs.LG] UPDATED)
    We consider the problem of latent bandits with cluster structure where there are multiple users, each with an associated multi-armed bandit problem. These users are grouped into \emph{latent} clusters such that the mean reward vectors of users within the same cluster are identical. At each round, a user, selected uniformly at random, pulls an arm and observes a corresponding noisy reward. The goal of the users is to maximize their cumulative rewards. This problem is central to practical recommendation systems and has received wide attention of late \cite{gentile2014online, maillard2014latent}. Now, if each user acts independently, then they would have to explore each arm independently and a regret of $\Omega(\sqrt{\mathsf{MNT}})$ is unavoidable, where $\mathsf{M}, \mathsf{N}$ are the number of arms and users, respectively. Instead, we propose LATTICE (Latent bAndiTs via maTrIx ComplEtion) which allows exploitation of the latent cluster structure to provide the minimax optimal regret of $\widetilde{O}(\sqrt{(\mathsf{M}+\mathsf{N})\mathsf{T}})$, when the number of clusters is $\widetilde{O}(1)$. This is the first algorithm to guarantee such strong regret bound. LATTICE is based on a careful exploitation of arm information within a cluster while simultaneously clustering users. Furthermore, it is computationally efficient and requires only $O(\log{\mathsf{T}})$ calls to an offline matrix completion oracle across all $\mathsf{T}$ rounds.
    Conformalization of Sparse Generalized Linear Models. (arXiv:2307.05109v1 [cs.LG])
    Given a sequence of observable variables $\{(x_1, y_1), \ldots, (x_n, y_n)\}$, the conformal prediction method estimates a confidence set for $y_{n+1}$ given $x_{n+1}$ that is valid for any finite sample size by merely assuming that the joint distribution of the data is permutation invariant. Although attractive, computing such a set is computationally infeasible in most regression problems. Indeed, in these cases, the unknown variable $y_{n+1}$ can take an infinite number of possible candidate values, and generating conformal sets requires retraining a predictive model for each candidate. In this paper, we focus on a sparse linear model with only a subset of variables for prediction and use numerical continuation techniques to approximate the solution path efficiently. The critical property we exploit is that the set of selected variables is invariant under a small perturbation of the input data. Therefore, it is sufficient to enumerate and refit the model only at the change points of the set of active features and smoothly interpolate the rest of the solution via a Predictor-Corrector mechanism. We show how our path-following algorithm accurately approximates conformal prediction sets and illustrate its performance using synthetic and real data examples.
    Differentially Private Statistical Inference through $\beta$-Divergence One Posterior Sampling. (arXiv:2307.05194v1 [stat.ML])
    Differential privacy guarantees allow the results of a statistical analysis involving sensitive data to be released without compromising the privacy of any individual taking part. Achieving such guarantees generally requires the injection of noise, either directly into parameter estimates or into the estimation process. Instead of artificially introducing perturbations, sampling from Bayesian posterior distributions has been shown to be a special case of the exponential mechanism, producing consistent, and efficient private estimates without altering the data generative process. The application of current approaches has, however, been limited by their strong bounding assumptions which do not hold for basic models, such as simple linear regressors. To ameliorate this, we propose $\beta$D-Bayes, a posterior sampling scheme from a generalised posterior targeting the minimisation of the $\beta$-divergence between the model and the data generating process. This provides private estimation that is generally applicable without requiring changes to the underlying model and consistently learns the data generating parameter. We show that $\beta$D-Bayes produces more precise inference estimation for the same privacy guarantees, and further facilitates differentially private estimation via posterior sampling for complex classifiers and continuous regression models such as neural networks for the first time.
    Hybrid hidden Markov LSTM for short-term traffic flow prediction. (arXiv:2307.04954v1 [cs.LG])
    Deep learning (DL) methods have outperformed parametric models such as historical average, ARIMA and variants in predicting traffic variables into short and near-short future, that are critical for traffic management. Specifically, recurrent neural network (RNN) and its variants (e.g. long short-term memory) are designed to retain long-term temporal correlations and therefore are suitable for modeling sequences. However, multi-regime models assume the traffic system to evolve through multiple states (say, free-flow, congestion in traffic) with distinct characteristics, and hence, separate models are trained to characterize the traffic dynamics within each regime. For instance, Markov-switching models with a hidden Markov model (HMM) for regime identification is capable of capturing complex dynamic patterns and non-stationarity. Interestingly, both HMM and LSTM can be used for modeling an observation sequence from a set of latent or, hidden state variables. In LSTM, the latent variable is computed in a deterministic manner from the current observation and the previous latent variable, while, in HMM, the set of latent variables is a Markov chain. Inspired by research in natural language processing, a hybrid hidden Markov-LSTM model that is capable of learning complementary features in traffic data is proposed for traffic flow prediction. Results indicate significant performance gains in using hybrid architecture compared to conventional methods such as Markov switching ARIMA and LSTM.
    Diagnosing Model Performance Under Distribution Shift. (arXiv:2303.02011v4 [stat.ML] UPDATED)
    Prediction models can perform poorly when deployed to target distributions different from the training distribution. To understand these operational failure modes, we develop a method, called DIstribution Shift DEcomposition (DISDE), to attribute a drop in performance to different types of distribution shifts. Our approach decomposes the performance drop into terms for 1) an increase in harder but frequently seen examples from training, 2) changes in the relationship between features and outcomes, and 3) poor performance on examples infrequent or unseen during training. These terms are defined by fixing a distribution on $X$ while varying the conditional distribution of $Y \mid X$ between training and target, or by fixing the conditional distribution of $Y \mid X$ while varying the distribution on $X$. In order to do this, we define a hypothetical distribution on $X$ consisting of values common in both training and target, over which it is easy to compare $Y \mid X$ and thus predictive performance. We estimate performance on this hypothetical distribution via reweighting methods. Empirically, we show how our method can 1) inform potential modeling improvements across distribution shifts for employment prediction on tabular census data, and 2) help to explain why certain domain adaptation methods fail to improve model performance for satellite image classification.
    MAP- and MLE-Based Teaching. (arXiv:2307.05252v1 [cs.LG])
    Imagine a learner L who tries to infer a hidden concept from a collection of observations. Building on the work [4] of Ferri et al., we assume the learner to be parameterized by priors P(c) and by c-conditional likelihoods P(z|c) where c ranges over all concepts in a given class C and z ranges over all observations in an observation set Z. L is called a MAP-learner (resp. an MLE-learner) if it thinks of a collection S of observations as a random sample and returns the concept with the maximum a-posteriori probability (resp. the concept which maximizes the c-conditional likelihood of S). Depending on whether L assumes that S is obtained from ordered or unordered sampling resp. from sampling with or without replacement, we can distinguish four different sampling modes. Given a target concept c in C, a teacher for a MAP-learner L aims at finding a smallest collection of observations that causes L to return c. This approach leads in a natural manner to various notions of a MAP- or MLE-teaching dimension of a concept class C. Our main results are: We show that this teaching model has some desirable monotonicity properties. We clarify how the four sampling modes are related to each other. As for the (important!) special case, where concepts are subsets of a domain and observations are 0,1-labeled examples, we obtain some additional results. First of all, we characterize the MAP- and MLE-teaching dimension associated with an optimally parameterized MAP-learner graph-theoretically. From this central result, some other ones are easy to derive. It is shown, for instance, that the MLE-teaching dimension is either equal to the MAP-teaching dimension or exceeds the latter by 1. It is shown furthermore that these dimensions can be bounded from above by the so-called antichain number, the VC-dimension and related combinatorial parameters. Moreover they can be computed in polynomial time.
    Normalized mutual information is a biased measure for classification and community detection. (arXiv:2307.01282v1 [cs.SI] CROSS LISTED)
    Normalized mutual information is widely used as a similarity measure for evaluating the performance of clustering and classification algorithms. In this paper, we show that results returned by the normalized mutual information are biased for two reasons: first, because they ignore the information content of the contingency table and, second, because their symmetric normalization introduces spurious dependence on algorithm output. We introduce a modified version of the mutual information that remedies both of these shortcomings. As a practical demonstration of the importance of using an unbiased measure, we perform extensive numerical tests on a basket of popular algorithms for network community detection and show that one's conclusions about which algorithm is best are significantly affected by the biases in the traditional mutual information.
    Robust Inference of Manifold Density and Geometry by Doubly Stochastic Scaling. (arXiv:2209.08004v2 [math.ST] UPDATED)
    The Gaussian kernel and its traditional normalizations (e.g., row-stochastic) are popular approaches for assessing similarities between data points. Yet, they can be inaccurate under high-dimensional noise, especially if the noise magnitude varies considerably across the data, e.g., under heteroskedasticity or outliers. In this work, we investigate a more robust alternative -- the doubly stochastic normalization of the Gaussian kernel. We consider a setting where points are sampled from an unknown density on a low-dimensional manifold embedded in high-dimensional space and corrupted by possibly strong, non-identically distributed, sub-Gaussian noise. We establish that the doubly stochastic affinity matrix and its scaling factors concentrate around certain population forms, and provide corresponding finite-sample probabilistic error bounds. We then utilize these results to develop several tools for robust inference under general high-dimensional noise. First, we derive a robust density estimator that reliably infers the underlying sampling density and can substantially outperform the standard kernel density estimator under heteroskedasticity and outliers. Second, we obtain estimators for the pointwise noise magnitudes, the pointwise signal magnitudes, and the pairwise Euclidean distances between clean data points. Lastly, we derive robust graph Laplacian normalizations that accurately approximate various manifold Laplacians, including the Laplace Beltrami operator, improving over traditional normalizations in noisy settings. We exemplify our results in simulations and on real single-cell RNA-sequencing data. For the latter, we show that in contrast to traditional methods, our approach is robust to variability in technical noise levels across cell types.
    Selective Sampling and Imitation Learning via Online Regression. (arXiv:2307.04998v1 [cs.LG])
    We consider the problem of Imitation Learning (IL) by actively querying noisy expert for feedback. While imitation learning has been empirically successful, much of prior work assumes access to noiseless expert feedback which is not practical in many applications. In fact, when one only has access to noisy expert feedback, algorithms that rely on purely offline data (non-interactive IL) can be shown to need a prohibitively large number of samples to be successful. In contrast, in this work, we provide an interactive algorithm for IL that uses selective sampling to actively query the noisy expert for feedback. Our contributions are twofold: First, we provide a new selective sampling algorithm that works with general function classes and multiple actions, and obtains the best-known bounds for the regret and the number of queries. Next, we extend this analysis to the problem of IL with noisy expert feedback and provide a new IL algorithm that makes limited queries. Our algorithm for selective sampling leverages function approximation, and relies on an online regression oracle w.r.t.~the given model class to predict actions, and to decide whether to query the expert for its label. On the theoretical side, the regret bound of our algorithm is upper bounded by the regret of the online regression oracle, while the query complexity additionally depends on the eluder dimension of the model class. We complement this with a lower bound that demonstrates that our results are tight. We extend our selective sampling algorithm for IL with general function approximation and provide bounds on both the regret and the number of queries made to the noisy expert. A key novelty here is that our regret and query complexity bounds only depend on the number of times the optimal policy (and not the noisy expert, or the learner) go to states that have a small margin.
    The Statistical Complexity of Interactive Decision Making. (arXiv:2112.13487v3 [cs.LG] UPDATED)
    A fundamental challenge in interactive learning and decision making, ranging from bandit problems to reinforcement learning, is to provide sample-efficient, adaptive learning algorithms that achieve near-optimal regret. This question is analogous to the classical problem of optimal (supervised) statistical learning, where there are well-known complexity measures (e.g., VC dimension and Rademacher complexity) that govern the statistical complexity of learning. However, characterizing the statistical complexity of interactive learning is substantially more challenging due to the adaptive nature of the problem. The main result of this work provides a complexity measure, the Decision-Estimation Coefficient, that is proven to be both necessary and sufficient for sample-efficient interactive learning. In particular, we provide: 1. a lower bound on the optimal regret for any interactive decision making problem, establishing the Decision-Estimation Coefficient as a fundamental limit. 2. a unified algorithm design principle, Estimation-to-Decisions (E2D), which transforms any algorithm for supervised estimation into an online algorithm for decision making. E2D attains a regret bound that matches our lower bound up to dependence on a notion of estimation performance, thereby achieving optimal sample-efficient learning as characterized by the Decision-Estimation Coefficient. Taken together, these results constitute a theory of learnability for interactive decision making. When applied to reinforcement learning settings, the Decision-Estimation Coefficient recovers essentially all existing hardness results and lower bounds. More broadly, the approach can be viewed as a decision-theoretic analogue of the classical Le Cam theory of statistical estimation; it also unifies a number of existing approaches -- both Bayesian and frequentist.
    Realising Synthetic Active Inference Agents, Part II: Variational Message Updates. (arXiv:2306.02733v2 [stat.ML] UPDATED)
    The Free Energy Principle (FEP) describes (biological) agents as minimising a variational Free Energy (FE) with respect to a generative model of their environment. Active Inference (AIF) is a corollary of the FEP that describes how agents explore and exploit their environment by minimising an expected FE objective. In two related papers, we describe a scalable, epistemic approach to synthetic AIF agents, by message passing on free-form Forney-style Factor Graphs (FFGs). A companion paper (part I) introduces a Constrained FFG (CFFG) notation that visually represents (generalised) FE objectives for AIF. The current paper (part II) derives message passing algorithms that minimise (generalised) FE objectives on a CFFG by variational calculus. A comparison between simulated Bethe and generalised FE agents illustrates how synthetic AIF induces epistemic behaviour on a T-maze navigation task. With a full message passing account of synthetic AIF agents, it becomes possible to derive and reuse message updates across models and move closer to industrial applications of synthetic AIF.
    Randomized Exploration in Generalized Linear Bandits. (arXiv:1906.08947v3 [cs.LG] UPDATED)
    We study two randomized algorithms for generalized linear bandits. The first, GLM-TSL, samples a generalized linear model (GLM) from the Laplace approximation to the posterior distribution. The second, GLM-FPL, fits a GLM to a randomly perturbed history of past rewards. We analyze both algorithms and derive $\tilde{O}(d \sqrt{n \log K})$ upper bounds on their $n$-round regret, where $d$ is the number of features and $K$ is the number of arms. The former improves on prior work while the latter is the first for Gaussian noise perturbations in non-linear models. We empirically evaluate both GLM-TSL and GLM-FPL in logistic bandits, and apply GLM-FPL to neural network bandits. Our work showcases the role of randomization, beyond posterior sampling, in exploration.
    Tracking Most Significant Shifts in Nonparametric Contextual Bandits. (arXiv:2307.05341v1 [stat.ML])
    We study nonparametric contextual bandits where Lipschitz mean reward functions may change over time. We first establish the minimax dynamic regret rate in this less understood setting in terms of number of changes $L$ and total-variation $V$, both capturing all changes in distribution over context space, and argue that state-of-the-art procedures are suboptimal in this setting. Next, we tend to the question of an adaptivity for this setting, i.e. achieving the minimax rate without knowledge of $L$ or $V$. Quite importantly, we posit that the bandit problem, viewed locally at a given context $X_t$, should not be affected by reward changes in other parts of context space $\cal X$. We therefore propose a notion of change, which we term experienced significant shifts, that better accounts for locality, and thus counts considerably less changes than $L$ and $V$. Furthermore, similar to recent work on non-stationary MAB (Suk & Kpotufe, 2022), experienced significant shifts only count the most significant changes in mean rewards, e.g., severe best-arm changes relevant to observed contexts. Our main result is to show that this more tolerant notion of change can in fact be adapted to.
    Prediction intervals for neural network models using weighted asymmetric loss functions. (arXiv:2210.04318v4 [stat.ML] UPDATED)
    We propose a simple and efficient approach to generate a prediction intervals (PI) for approximated and forecasted trends. Our method leverages a weighted asymmetric loss function to estimate the lower and upper bounds of the PI, with the weights determined by its coverage probability. We provide a concise mathematical proof of the method, show how it can be extended to derive PIs for parametrised functions and argue why the method works for predicting PIs of dependent variables. The presented tests of the method on a real-world forecasting task using a neural network-based model show that it can produce reliable PIs in complex machine learning scenarios.
    Stochastic Nested Compositional Bi-level Optimization for Robust Feature Learning. (arXiv:2307.05384v1 [math.OC])
    We develop and analyze stochastic approximation algorithms for solving nested compositional bi-level optimization problems. These problems involve a nested composition of $T$ potentially non-convex smooth functions in the upper-level, and a smooth and strongly convex function in the lower-level. Our proposed algorithm does not rely on matrix inversions or mini-batches and can achieve an $\epsilon$-stationary solution with an oracle complexity of approximately $\tilde{O}_T(1/\epsilon^{2})$, assuming the availability of stochastic first-order oracles for the individual functions in the composition and the lower-level, which are unbiased and have bounded moments. Here, $\tilde{O}_T$ hides polylog factors and constants that depend on $T$. The key challenge we address in establishing this result relates to handling three distinct sources of bias in the stochastic gradients. The first source arises from the compositional nature of the upper-level, the second stems from the bi-level structure, and the third emerges due to the utilization of Neumann series approximations to avoid matrix inversion. To demonstrate the effectiveness of our approach, we apply it to the problem of robust feature learning for deep neural networks under covariate shift, showcasing the benefits and advantages of our methodology in that context.
    The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks. (arXiv:2306.11680v2 [cs.LG] UPDATED)
    We study the implicit bias of batch normalization trained by gradient descent. We show that when learning a linear model with batch normalization for binary classification, gradient descent converges to a uniform margin classifier on the training data with an $\exp(-\Omega(\log^2 t))$ convergence rate. This distinguishes linear models with batch normalization from those without batch normalization in terms of both the type of implicit bias and the convergence rate. We further extend our result to a class of two-layer, single-filter linear convolutional neural networks, and show that batch normalization has an implicit bias towards a patch-wise uniform margin. Based on two examples, we demonstrate that patch-wise uniform margin classifiers can outperform the maximum margin classifiers in certain learning problems. Our results contribute to a better theoretical understanding of batch normalization.
    Comparison of High-Dimensional Bayesian Optimization Algorithms on BBOB. (arXiv:2303.00890v2 [cs.LG] UPDATED)
    Bayesian Optimization (BO) is a class of black-box, surrogate-based heuristics that can efficiently optimize problems that are expensive to evaluate, and hence admit only small evaluation budgets. BO is particularly popular for solving numerical optimization problems in industry, where the evaluation of objective functions often relies on time-consuming simulations or physical experiments. However, many industrial problems depend on a large number of parameters. This poses a challenge for BO algorithms, whose performance is often reported to suffer when the dimension grows beyond 15 variables. Although many new algorithms have been proposed to address this problem, it is not well understood which one is the best for which optimization scenario. In this work, we compare five state-of-the-art high-dimensional BO algorithms, with vanilla BO and CMA-ES on the 24 BBOB functions of the COCO environment at increasing dimensionality, ranging from 10 to 60 variables. Our results confirm the superiority of BO over CMA-ES for limited evaluation budgets and suggest that the most promising approach to improve BO is the use of trust regions. However, we also observe significant performance differences for different function landscapes and budget exploitation phases, indicating improvement potential, e.g., through hybridization of algorithmic components.
    A stochastic optimization approach to minimize robust density power-based divergences for general parametric density models. (arXiv:2307.05251v1 [stat.ME])
    Density power divergence (DPD) [Basu et al. (1998), Biometrika], designed to estimate the underlying distribution of the observations robustly, comprises an integral term of the power of the parametric density models to be estimated. While the explicit form of the integral term can be obtained for some specific densities (such as normal density and exponential density), its computational intractability has prohibited the application of DPD-based estimation to more general parametric densities, over a quarter of a century since the proposal of DPD. This study proposes a stochastic optimization approach to minimize DPD for general parametric density models and explains its adequacy by referring to conventional theories on stochastic optimization. The proposed approach also can be applied to the minimization of another density power-based $\gamma$-divergence with the aid of unnormalized models [Kanamori and Fujisawa (2015), Biometrika].
    BayesFlow: Amortized Bayesian Workflows With Neural Networks. (arXiv:2306.16015v2 [cs.LG] UPDATED)
    Modern Bayesian inference involves a mixture of computational techniques for estimating, validating, and drawing conclusions from probabilistic models as part of principled workflows for data analysis. Typical problems in Bayesian workflows are the approximation of intractable posterior distributions for diverse model types and the comparison of competing models of the same process in terms of their complexity and predictive performance. This manuscript introduces the Python library BayesFlow for simulation-based training of established neural network architectures for amortized data compression and inference. Amortized Bayesian inference, as implemented in BayesFlow, enables users to train custom neural networks on model simulations and re-use these networks for any subsequent application of the models. Since the trained networks can perform inference almost instantaneously, the upfront neural network training is quickly amortized.
    Geometric Neural Diffusion Processes. (arXiv:2307.05431v1 [stat.ML])
    Denoising diffusion models have proven to be a flexible and effective paradigm for generative modelling. Their recent extension to infinite dimensional Euclidean spaces has allowed for the modelling of stochastic processes. However, many problems in the natural sciences incorporate symmetries and involve data living in non-Euclidean spaces. In this work, we extend the framework of diffusion models to incorporate a series of geometric priors in infinite-dimension modelling. We do so by a) constructing a noising process which admits, as limiting distribution, a geometric Gaussian process that transforms under the symmetry group of interest, and b) approximating the score with a neural network that is equivariant w.r.t. this group. We show that with these conditions, the generative functional model admits the same symmetry. We demonstrate scalability and capacity of the model, using a novel Langevin-based conditional sampler, to fit complex scalar and vector fields, with Euclidean and spherical codomain, on synthetic and real-world weather data.
    Leveraging Variational Autoencoders for Parameterized MMSE Channel Estimation. (arXiv:2307.05352v1 [eess.SP])
    In this manuscript, we propose to utilize the generative neural network-based variational autoencoder for channel estimation. The variational autoencoder models the underlying true but unknown channel distribution as a conditional Gaussian distribution in a novel way. The derived channel estimator exploits the internal structure of the variational autoencoder to parameterize an approximation of the mean squared error optimal estimator resulting from the conditional Gaussian channel models. We provide a rigorous analysis under which conditions a variational autoencoder-based estimator is mean squared error optimal. We then present considerations that make the variational autoencoder-based estimator practical and propose three different estimator variants that differ in their access to channel knowledge during the training and evaluation phase. In particular, the proposed estimator variant trained solely on noisy pilot observations is particularly noteworthy as it does not require access to noise-free, ground-truth channel data during training or evaluation. Extensive numerical simulations first analyze the internal behavior of the variational autoencoder-based estimators and then demonstrate excellent channel estimation performance compared to related classical and machine learning-based state-of-the-art channel estimators.
    Reinforcement Learning with Non-Cumulative Objective. (arXiv:2307.04957v1 [cs.LG])
    In reinforcement learning, the objective is almost always defined as a \emph{cumulative} function over the rewards along the process. However, there are many optimal control and reinforcement learning problems in various application fields, especially in communications and networking, where the objectives are not naturally expressed as summations of the rewards. In this paper, we recognize the prevalence of non-cumulative objectives in various problems, and propose a modification to existing algorithms for optimizing such objectives. Specifically, we dive into the fundamental building block for many optimal control and reinforcement learning algorithms: the Bellman optimality equation. To optimize a non-cumulative objective, we replace the original summation operation in the Bellman update rule with a generalized operation corresponding to the objective. Furthermore, we provide sufficient conditions on the form of the generalized operation as well as assumptions on the Markov decision process under which the globally optimal convergence of the generalized Bellman updates can be guaranteed. We demonstrate the idea experimentally with the bottleneck objective, i.e., the objectives determined by the minimum reward along the process, on classical optimal control and reinforcement learning tasks, as well as on two network routing problems on maximizing the flow rates.
    Law of Large Numbers for Bayesian two-layer Neural Network trained with Variational Inference. (arXiv:2307.04779v1 [stat.ML])
    We provide a rigorous analysis of training by variational inference (VI) of Bayesian neural networks in the two-layer and infinite-width case. We consider a regression problem with a regularized evidence lower bound (ELBO) which is decomposed into the expected log-likelihood of the data and the Kullback-Leibler (KL) divergence between the a priori distribution and the variational posterior. With an appropriate weighting of the KL, we prove a law of large numbers for three different training schemes: (i) the idealized case with exact estimation of a multiple Gaussian integral from the reparametrization trick, (ii) a minibatch scheme using Monte Carlo sampling, commonly known as Bayes by Backprop, and (iii) a new and computationally cheaper algorithm which we introduce as Minimal VI. An important result is that all methods converge to the same mean-field limit. Finally, we illustrate our results numerically and discuss the need for the derivation of a central limit theorem.
    Perturbed-History Exploration in Stochastic Linear Bandits. (arXiv:1903.09132v2 [cs.LG] UPDATED)
    We propose a new online algorithm for cumulative regret minimization in a stochastic linear bandit. The algorithm pulls the arm with the highest estimated reward in a linear model trained on its perturbed history. Therefore, we call it perturbed-history exploration in a linear bandit (LinPHE). The perturbed history is a mixture of observed rewards and randomly generated i.i.d. pseudo-rewards. We derive a $\tilde{O}(d \sqrt{n})$ gap-free bound on the $n$-round regret of LinPHE, where $d$ is the number of features. The key steps in our analysis are new concentration and anti-concentration bounds on the weighted sum of Bernoulli random variables. To show the generality of our design, we generalize LinPHE to a logistic model. We evaluate our algorithms empirically and show that they are practical.
    Dynamics of Temporal Difference Reinforcement Learning. (arXiv:2307.04841v1 [stat.ML])
    Reinforcement learning has been successful across several applications in which agents have to learn to act in environments with sparse feedback. However, despite this empirical success there is still a lack of theoretical understanding of how the parameters of reinforcement learning models and the features used to represent states interact to control the dynamics of learning. In this work, we use concepts from statistical physics, to study the typical case learning curves for temporal difference learning of a value function with linear function approximators. Our theory is derived under a Gaussian equivalence hypothesis where averages over the random trajectories are replaced with temporally correlated Gaussian feature averages and we validate our assumptions on small scale Markov Decision Processes. We find that the stochastic semi-gradient noise due to subsampling the space of possible episodes leads to significant plateaus in the value error, unlike in traditional gradient descent dynamics. We study how learning dynamics and plateaus depend on feature structure, learning rate, discount factor, and reward function. We then analyze how strategies like learning rate annealing and reward shaping can favorably alter learning dynamics and plateaus. To conclude, our work introduces new tools to open a new direction towards developing a theory of learning dynamics in reinforcement learning.

  • Open

    MAMAY AI-powered algorithm digitises taste for food and beverage makers
    submitted by /u/trueslicky [link] [comments]  ( 8 min )
    Jesus in 2023 lmao
    submitted by /u/reinkrestfoxy [link] [comments]  ( 8 min )
    Can AI replicate a character's voice cracks and raspy voice?
    I used RVC V2 to make an AI model of a character whose voice is raspy, has a lot of voice cracks and has a very "shouty" voice. I used a dataset which was 70% of voice clips of him yelling and/or talking loudly and the rest were of him talking normally. The dataset was decent quality, I did have to use background music remover software on some parts but it's overall decent. The thing is, the model doesn't sound ANYTHING like the character. For some reason it's way too soft spoken, and even when it's supposed to be yelling or screaming it sounds kinda like he's whispering. The AI's neutral voice does sound like him but it's missing his voice cracks and voice raspiness. Is there any way I can mimic it? submitted by /u/donutpancito [link] [comments]  ( 8 min )
    Inside the AI Factory (about the underclass work force)
    submitted by /u/facinabush [link] [comments]  ( 8 min )
    Behind the secretive work of the many, many humans helping to train AI
    submitted by /u/facinabush [link] [comments]  ( 8 min )
    if you stare into the abyss, the abyss stares back at you
    I'm scared of AI but this is one of my favourite quotes and the result is magnificent submitted by /u/peditte [link] [comments]  ( 8 min )
    Anthropic releases Claude 2 with 100K token limit
    submitted by /u/wyem [link] [comments]  ( 8 min )
    AI Can Accurately Predict Potentially Fatal Cardiac Events in Firefighters
    submitted by /u/nist [link] [comments]  ( 8 min )
    Report: China to tighten rules around releasing generative AI tools
    submitted by /u/PleasantLiberation [link] [comments]  ( 8 min )
    AI Robots Admit They'd Run Earth Better Than 'Clouded' Humans
    submitted by /u/ChubbyBrunch [link] [comments]  ( 8 min )
    One-Minute Daily AI News 7/10/2023
    Just like other large chip designers, AMD has already started to use AI for designing chips. In fact, Lisa Su, chief executive of AMD, believes that eventually, AI-enabled tools will dominate chip design as the complexity of modern processors is increasing exponentially.[1] Comedian Sarah Silverman and two authors are suing Meta and ChatGPT-maker OpenAI, alleging the companies’ AI language models were trained on copyrighted materials from their books without their knowledge or consent.[2] Several hospitals, including the Mayo Clinic, have begun test-driving Google’s Med-PaLM 2, an AI chatbot that is widely expected to shake up the healthcare industry. Med-PaLM 2 is an updated model of PaLM2, which the tech giant announced at Google I/O earlier this year. PaLM 2 is the language model underpinning Google’s AI tool, Bard.[3] Japanese police will begin testing security cameras equipped with AI-based technology to protect high-profile public figures, Nikkei has learned, as the country mourns the anniversary of the fatal shooting of former Prime Minister Shinzo Abe on Saturday. The technology could lead to the detection of suspicious activity, supplementing existing security measures.[4] Sources: [1] https://www.tomshardware.com/news/lisa-su-ai-will-dominate-chip-design [2] https://www.cnn.com/2023/07/10/tech/sarah-silverman-openai-meta-lawsuit/index.html [3] https://www.foxbusiness.com/technology/hospitals-begin-test-driving-googles-medical-ai-chatbot-report [4] https://asia.nikkei.com/Politics/Japan-police-to-test-AI-equipped-cameras-in-protecting-VIPs submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    would disruptive denial of service AI be a viable option in internet malpractice since its progressed so much?
    title submitted by /u/Icy_Independence_125 [link] [comments]  ( 8 min )
  • Open

    [D] Regression for Everyone
    Check my article on Regression. I'd love to hear your thoughts ☺️ https://medium.com/@sandundayananda/regression-fef89ad7c68f submitted by /u/sandun-dayananda [link] [comments]  ( 8 min )
    [D] Masters of AI/Machine Learning
    I have a Bachelor's in Electrical Engineering, valedictorian, 4.00 out of 4.00 CGPA. IELTS 7.5 and have been working in the AI center in a research company for 3 months now. I have 2 papers published, one of them is Falcon B40. My employer asked me to pursue a master's degree in AI or machine learning. The tuition fees don't matter (Fully sponsored) The degree has to be certified as (Msc of Artificial intelligence) OR (Msc of Machine Learning) Not (Msc of Computer science) . University should be QS World Rank 100 to 200 I applied to: 1- Imperial College London 2- Monash Univerisity 3- Univerity Technology Sydney I need a fourth option? I prefer a university that provides a rigid background of computer science since I have an electrical engineering bachelors. Also, I must start maximum by spring 2024.. Thank you ❤️ submitted by /u/Maithah_x [link] [comments]  ( 9 min )
    [P]Benchmarking NVIDIA RAPIDS vs. Pandas: Join our Experiment with Thousands of GPUs!
    We are excited to announce a benchmarking experiment comparing the performance of NVIDIA RAPIDS and Pandas, two powerful data manipulation libraries, on a large Kubernetes-based cluster. Our cluster consists of thousands of A4000 GPUs, providing an excellent opportunity to evaluate these libraries for various workloads that involve data scrubbing. If you're interested in participating, we are offering the necessary compute resources, and you can even define your own datasets! What is NVIDIA RAPIDS? NVIDIA RAPIDS is a suite of open-source software libraries and APIs that accelerate data science and analytics workflows on GPUs. It aims to provide GPU-accelerated alternatives to popular data science tools, including Pandas. RAPIDS leverages the power of GPUs to speed up data processing, maki…  ( 9 min )
    [R] Semantic-SAM: Segment and Recognize Anything at Any Granularity
    We introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity. 🔥code & demo link: https://github.com/UX-Decoder/Semantic-SAM 🔥paper link: https://arxiv.org/pdf/2307.04767.pdf 🔥Our model offers the following attributes from instance to part level: Granularity Abundance. Our model can produce all possible segmentation granularities for a user click with high quality, which enables more **controllable** and **user-friendly** interactive segmentation. Semantic Awareness. We jointly train SA-1B with semantically labeled datasets to learn the semantics in both object-level and part-level. High Quality. We base on the DETR-based model to implement both generic and interactive segmentation, and validate that SA-1B hel…  ( 9 min )
    [D] Confusion over AMD GPU Ai benchmarking
    I'm using an ai benchmarker from this website, https://ai-benchmark.com/ranking_deeplearning.html. It was the only ai benchmarker I could find that I could understand. For some reason the results are in AMD's favor. I always thought ai wasn't good enough on AMD, but it's beating out the 4090! Am I using this benchmark wrong or has AMD really gotten good recently? Also if anyone knows another benchmark I could use, that would be wonderful. ​ ​ https://preview.redd.it/c779j1pb4ebb1.png?width=1565&format=png&auto=webp&s=d62f003587c73443cc83acf98f14a69c4ee14d2c submitted by /u/SociallyApparent [link] [comments]  ( 8 min )
    [P] Reward Prioritization
    Hi all! I will preface by saying I am very new to machine learning. I understand the fundamentals but have little practice in implementation. I am trying to create an AI to play a game that I have re-created in Python by using TensorFlow/Keras. The game is similar to Tetris, and the goal is for the user to survive for as long as possible and the user can score more points by making more complex moves. In my current situation, the game has been entirely coded and converted to a useable format for the AI (I used OpenAI’s Gymnasium (the open-source fork)). Over the last two days I have created the Neural Net model (5 Dense layers with 6 output options) and have been tweaking variables with little noticeable difference in the AI’s abilities. While I certainly need help in several areas and am open to any and all recommendations, I currently feel that my issues lie in the rewards. Currently the AI plays 5000 rounds, then keeps the best 10% of all games based on the reward score. Currently the longer it survives, the higher the reward. However, I also want the model to prioritize score similarly to survival time. To compare this to Tetris, the player could survive for 1 hour only scoring 1-liners, or 5 minutes scoring 90% Tetris’s and end up with a higher ‘score’ than the one that survives longer. I am looking for a compromise between these two extremes that will allow me to select the best 10% of all the games played. I know that I may not have explained this problem very well, but I am open to all suggestions, and all comments are appreciated. Thanks! submitted by /u/TCPisJustFancyUDP [link] [comments]  ( 9 min )
    "[D]" Problems with Lazy Config detectron2 (MViTv2)
    Hey. So, I want to use a config file from a detectron2 project which is undere MViTv2 in my custom dataset which is "my_dataset_train" (datacatalog format) I want to use this config file https://github.com/facebookresearch/detectron2/blob/main/projects/MViTv2/configs/mask_rcnn_mvitv2_t_3x.py like the beneath typical way I use a yaml config file. But giving so many errors one after another that, I even failed to count at this point. ​ I have to use this config file with the dataloader which is in https://github.com/facebookresearch/detectron2/blob/main/projects/MViTv2/configs/common/coco_loader.py. I figured that i can use cfg.dataloader.train.dataset.names = "my_dataset_train" for this. ​ cfg = LazyConfig.load("detectron2/projects/MViTv2/configs/mask_rcnn_mvitv2_t_3x.py") cfg.dataloader.train.dataset.names = "my_dataset_train" train = model_zoo.get_config("common/train.py").train cfg.train.amp.enabled = True cfg.train.ddp.fp16_compression = True cfg.lr_multiplier = L(WarmupParamScheduler)( scheduler=L(MultiStepParamScheduler)( values=[1.0, 0.1, 0.01], milestones=[52500, 62500, 67500], ), warmup_length=250 / train.max_iter, warmup_factor=0.001, ) # cfg.train.init_checkpoint = "detectron2://ImageNetPretrained/mvitv2/MViTv2_T_in1k.pyth" ​ cfg.dataloader.train.total_batch_size = 1 cfg.train.max_iter = 200 cfg.optimizer = model_zoo.get_config("common/optim.py").AdamW cfg.optimizer.lr = 1.6e-4 dataset = instantiate(cfg.dataloader.train) # optimizer = instantiate(cfg.optimizer) # optimizer = model_zoo.get_config("common/optim.py").A ​ # trainer = AMPTrainer(model, dataset, optimizer) trainer = DefaultTrainer(cfg) trainer.train() # FileLink(str(OUTPUT_MODEL)) ​ But typical trainer.train(start_iter,cfg.max_iter) is not working here? How can run this without shell command? Thanks everyone submitted by /u/Ok-Reflection-4049 [link] [comments]  ( 9 min )
    [D] Keras 3.0 Announcement: Keras for TensorFlow, JAX, and PyTorch
    Keras just announced a preview version of Keras 3.0. It's a full rewrite of the Keras codebase that rebases it on top of a modular backend architecture. It makes it possible to run Keras workflows on top of arbitrary frameworks — starting with TensorFlow, JAX, and PyTorch. https://keras.io/keras_core/announcement/ submitted by /u/codemaker1 [link] [comments]  ( 8 min )
    Backpropaagtion with Cross-Entropy and Softmax, HOW? [D]
    Let Zs be the input of the output layer (for example, Z1 is the input of the first neuron in the output layer), Os be the output of the output layer (which are actually the results of applying the softmax activation function to Zs, for example, O1 = softmax(Z1)), and Ys be the target values (which are 0 or 1 because in this example we are dealing with classification problems and using one-hot encoding). E is the sum of the neuron's loss using the CrossEntropy loss function. Let's say our neural network has 2 neurons, and Y1 = 1 (so Y2 = 0). What is the derivative of E with respect to Z1 and the derivative of E with respect to Z2? After calculations, I came to the conclusion that the value of all derivative of E with respects to Zs(Z1 and Z2) should be equal, becasue they are all equal to O1-1 ( since Y1 = 1 as i said), so am i right or wrong?(and why) submitted by /u/qaz_zaqi [link] [comments]  ( 9 min )
    Generating a step by step painting [D]
    I am thinking about this from a long time. Is there anyway that we can generate each step of painting from an image input. If I give an image as input, it should be able to give me step by step instructions on how to paint. I like painting but I got bored of You submitted by /u/dosa-palli-chutney [link] [comments]  ( 8 min )
    [D] Compared to paying a monthly subscription to ChatGPT, or per token for long-form content on the OpenAI platform, Is Poe.com's annual payment worth the money?
    Pretty much as the title says. I'm wondering whether it would be cheaper paying for PdfGPT, Claude2 and GPT-4 access separately, or Poe.com as a bundle? submitted by /u/RTSBasebuilder [link] [comments]  ( 8 min )
    [R] Why was Tacotron trained on <1000h of data?
    Tacotron TTS models (e.g. Tacotron 2 and Parallel Tacotron 2) were trained on 25h and 405h of speech data, respectively. By comparison, more recent TTS systems are trained on >50,000h of speech data. Why were Tacotron models trained on such a relatively small volume of data? Was it simply a matter of compute resource, or is there something in the Tacotron architecture/training set-up that explains the relatively small training size? submitted by /u/Upstairs_Buy_6243 [link] [comments]  ( 8 min )
    [Discussion] AWS Sagemaker Issue
    Hi everyone, I have data that included these columns: customer_id,product_id,product_title,product_category,star_rating Then I used these functions, codes with descriptions below, and AWS Sagemaker config for creating model training I tried to write a function for predicting 5 best products for 1 customer using this model using the deployed endpoint. https://preview.redd.it/i3b2pm099cbb1.jpg?width=966&format=pjpg&auto=webp&s=1aa12ca864e500c1d0bc0e823e8c9e88c0a36316 https://preview.redd.it/nolhkl299cbb1.jpg?width=1002&format=pjpg&auto=webp&s=8d49a232620361076d802fd20ee124fa8a7c2feb https://preview.redd.it/ge5ggs299cbb1.jpg?width=895&format=pjpg&auto=webp&s=c58b80ff6a2af85546563d0a257da4cc3540da9f https://preview.redd.it/slbqok299cbb1.jpg?width=1028&format=pjpg&auto=webp&s=5ee8c5733dde42f1eb7d113c5eb0e0a6d3ba739c https://preview.redd.it/zu3nun099cbb1.jpg?width=1018&format=pjpg&auto=webp&s=86a29a222ab05ef4d4ed5c4eaf6fc321e1cfe194 https://preview.redd.it/bk9qif299cbb1.jpg?width=826&format=pjpg&auto=webp&s=80169c891de0ff3dd2e98fb5be3aa9eaa29b35c6 https://preview.redd.it/n1yigg299cbb1.jpg?width=898&format=pjpg&auto=webp&s=5a3f09983f9cddda1e0772ce473271a84790055d Nevertheless, I got a strange issue and cannot resolve it( SSLError: SSL validation failed for https:// runtime.sagemaker.us-east-1.amazonaws.com/endpoints/ factorization-machines-2023-07-10-12-13-54-256/invocations EOF occurred in violation of protocol (_ssl.c:2396)). Could someone please help me explain this, give me a solution, or help me solve it? submitted by /u/Hung_98 [link] [comments]  ( 8 min )
    [D] Seeking Comprehensive Project Tutorial for Industry Popular ML Tech Stack
    Hi all, I have a MS that focused in AI/ML, and took a role for the last several years that was supposed to focus on ML but never did. I'm now looking for jobs in this area and want to be able to list understanding/proficiency in industry popular technologies on my resume. Specifically, I'm looking to incorporate the following: Data Preprocessing: Numpy, pandas, SQL for data extraction Machine Learning Frameworks: TensorFlow, PyTorch (perform GPU training), Keras, XGBoost, LightGBM Model Deployment and Scalability: Docker and Kubernetes Big Data Processing: Apache Spark, Hadoop Version Control: Git Cloud Services: AWS, Microsoft Azure, or Google Cloud Platform System: Unix-Based I know these individual technologies have their own tutorials and documentation online. Does anyone have a more comprehensive project/tutorial that integrates these technologies? I'd like to create my own project that I can put on my resume to demonstrate proficiency, but I'm feeling a bit overwhelmed learning the entire stack and how to integrate it. Thanks! submitted by /u/ruseriousrightnow [link] [comments]  ( 9 min )
    [D] Scaling Neuroscience Research Using Federated Learning
    https://ieeexplore.ieee.org/document/9433925 Research on Federated Learning and its applications moved a step forward! https://github.com/NevronAI/metisfl ​ ​ submitted by /u/No-Literature-1930 [link] [comments]  ( 8 min )
    [D] Weird loss behaviour with difusion models.
    Has anyone had this happen when training a diffusion model? : The loss decreases to a very low value (close to 0) quite early (around half the first epoch), and keeps oscillating there. Image quality is improving throughout training but the loss isn't really decreasing, just fluctuating around the same values. I had this happen when training pixel-space diffusion models (with latent diffusion the loss seems to decrease gradually), and when fine-tuning Stable Diffusion with textual inversion (loss isn't really decreasing whereas image quality is increasing). submitted by /u/theotherfellah [link] [comments]  ( 8 min )
    [R] Large Language Models as General Pattern Machines. In context, LLMs are capable of completing a wide variety of complex non linguistic patterns.
    Blog/Paper - https://general-pattern-machines.github.io/ submitted by /u/MysteryInc152 [link] [comments]  ( 8 min )
    [P] [R] Machine Unlearning Summary
    Hey, fellow data enthusiasts! We made a Kaggle notebook that delves into an intriguing concept called "Machine Unlearning: The Right to Be Forgotten." If you're interested in exploring the uncharted territory of reversing machine learning models and allowing data to be forgotten, this notebook is an absolute must-read! 📚 Notebook: https://www.kaggle.com/code/tamlhp/machine-unlearning-the-right-to-be-forgotten/ Overview: We all know how machine learning algorithms have transformed our lives by uncovering patterns and making predictions. However, what if we want to reverse this process and erase the knowledge acquired by these models? That's where machine unlearning comes into play, and this notebook provides an in-depth exploration of this cutting-edge concept. Key Highlights: Unde…  ( 9 min )
    [D] - Representation Learning MSc course: Videos + PyTorch exercises
    [🎥📚 Exciting NEWS, everyone! 🌟 ] - I'm ultra thrilled to announce that the first ever online #RepresentationLearniing MSc course is now publicly available. Ready to learn how to learn informative representations from images, proteins, and natural language? 🎓📹 1️⃣ Visit https://www.youtube.com/playlist?list=PL3mKiGE4zNJJ83K4c3IBka6eYfe6v71dS to access the full playlist of recorded lectures. 📖💡 2️⃣ Feel free to tag anyone who might be interested. 💬🤝 3️⃣ Spread the word and help us out! 🌍🗣️ 4️⃣ Github link with all the material: https://github.com/HHU-MMBS/RepresentationLearning_SS2023 It's our (together with Felix Michels, and Tim Kaiser) first attempt, thus be prepared for some mistakes along the way. Let us know how you like it! Have a great day, N.A. submitted by /u/black0017 [link] [comments]  ( 8 min )
    [P]Using lime for this ??
    hello peeps, so i implemented this - https://github.com/karndeepsingh/Extract_key_information_Document_understanding/blob/main/Fine_tuning_LayoutLMForTokenClassification_on_FUNSD.ipynb so the model predicticts the tokens and the bounding boxes (whether its a question or answer or other), i am trying to use lime here to validate the results but idk which lime explainer to use the text or the image ? this notebook uses OCR to get the words in the document. i dont get how to use lime here like what to look for ? submitted by /u/Affectionate_Win2460 [link] [comments]  ( 8 min )
    [R] A Survey of Machine Unlearning
    submitted by /u/KingsmanVince [link] [comments]  ( 8 min )
    [D] [R] Use cases for Generative AI in Robotics
    Hi everyone, Are there any use cases for Generative AI in Robotics apart from Autonomous Decision making? Am looking for other use cases within the Gen AI for our Cobots. I couldn't find any other use cases. Could you guys share your ideas and use cases? submitted by /u/Meeloveall [link] [comments]  ( 8 min )
  • Open

    DSC Weekly 11 July 2023
    Announcements Top Stories In-Depth The post DSC Weekly 11 July 2023 appeared first on Data Science Central.  ( 20 min )
  • Open

    An open-source gymnasium for machine learning assisted computer architecture design
    Posted by Amir Yazdanbakhsh, Research Scientist, and Vijay Janapa Reddi, Visiting Researcher, Google Research Computer Architecture research has a long history of developing simulators and tools to evaluate and shape the design of computer systems. For example, the SimpleScalar simulator was introduced in the late 1990s and allowed researchers to explore various microarchitectural ideas. Computer architecture simulators and tools, such as gem5, DRAMSys, and many more have played a significant role in advancing computer architecture research. Since then, these shared resources and infrastructure have benefited industry and academia and have enabled researchers to systematically build on each other's work, leading to significant advances in the field. Nonetheless, computer architectu…  ( 93 min )
  • Open

    Access private repos using the @remote decorator for Amazon SageMaker training workloads
    As more and more customers are looking to put machine learning (ML) workloads in production, there is a large push in organizations to shorten the development lifecycle of ML code. Many organizations prefer writing their ML code in a production-ready style in the form of Python methods and classes as opposed to an exploratory style […]  ( 8 min )
  • Open

    Russian transliteration hack
    I mentioned in the previous post that I had been poking around in HTML entities and noticed symbols for Fourier transforms and such. I also noticed HTML entities for Cyrillic letters. These entities have the form & + transliteration + cy;. For example, the Cyrillic letter П is based on the Greek letter Π and […] Russian transliteration hack first appeared on John D. Cook.  ( 5 min )
    Symbols for transforms
    I was looking through HTML entities and ran across ℱ. I searched for all entities ending in trf; and also found ℳ, ℒ, and ℨ. Apparently “trf” stands “transform” and these symbols are intended to be used to represent the Fourier transform, Mellin transform, Laplace transform, and z-transform. You would not know from the Unicode […] Symbols for transforms first appeared on John D. Cook.  ( 5 min )
  • Open

    Microsoft at ICALP 2023: Deploying cloud capacity robustly against power failures
    Efficiency is vital in the face of escalating demand for cloud resources. And efficient power management strategies address the bottleneck of power availability in datacenters. Learn how we optimize power allocation to support sustainable resource usage. The post Microsoft at ICALP 2023: Deploying cloud capacity robustly against power failures appeared first on Microsoft Research.  ( 11 min )
  • Open

    Sierra Division Studios Presents Three Epic Projects Built With NVIDIA Omniverse
    Jacob Norris is a 3D artist and the president, co-founder and creative director of Sierra Division Studios — an outsource studio specializing in digital 3D content creation.  ( 9 min )
  • Open

    GPT-4's details are leaked
    submitted by /u/nickb [link] [comments]  ( 8 min )
    Intel reports high-rendering graphics with low-power GPUs
    submitted by /u/keghn [link] [comments]  ( 8 min )

  • Open

    Afro Pink
    submitted by /u/Akumetsu_971 [link] [comments]  ( 8 min )
    How could zero point energy help ai go to the next level ?
    I am curious about this mainly because many people are talking about uap ! The technology that powers these uap most likely are not oil or gas or batteries more likely zero point energy . My main question like i said how could it make ai even better to heights to where its never been before ! submitted by /u/jdgementdragonotk [link] [comments]  ( 8 min )
    What are some GitHub security best practices?
    It seems like about 90% of the stuff happening in AI is only accessible via GitHub. I'm probably just being overly cautious, but downloading something from such a public place is just not something I am currently comfortable with. What are your thought on this? Are there precautions you take that I should be aware of before venturing into this territory? Or is it just generally considered pretty safe, and nothing to worry about much? submitted by /u/gcubed [link] [comments]  ( 8 min )
    Offline music AI generator
    Hi all, i'm searching AI for music for my own game, so I want to find AI without copyright. But every model that i found were online with copyright rules. Is there any offline, better - opensource AI model? submitted by /u/krakotay1 [link] [comments]  ( 8 min )
    An open Source AI on a european server, readings PDFe (Texts), answering questions in different languages - pls help
    Hey All - i am looking for an open-source chatbot. He will get a bench of PDF documents (training-Data). this data should be searchable and the bot should be able to answer questions in different languages. so far, so ez: we´re in germany - which means we do have hard regulations for servers, which are NOT hosted in the EU (god i really hate to write that). so we need a model, which can be installed here, train it hre and finally it should his work here. If you have any advice - i would be really thankful! submitted by /u/myreddit333 [link] [comments]  ( 8 min )
    How is it possible that there were no LLM AIs, then there was ChatGPT, now there are dozens of similar products?
    Like, didn’t ChatGPT need a whole company in stealth mode for years, with hundreds of millions of investment? How is it that they release their product and then overnight there are competitors – and not just from the massive tech companies? submitted by /u/Aquillyne [link] [comments]  ( 8 min )
    The Rockstars of the AI World
    AGI: Brainiac robots will be like the ultimate multitaskers on steroids. They will possess mind-blowing cognitive abilities, making them the geniuses who can solve complex math problems in their sleep and then casually drop fashion advice that would make Coco Chanel blush. You might catch them pondering the mysteries of the universe or composing symphonies that give Beethoven a run for his money. Are they secretly working as undercover geniuses on the weekends? All we know is that AGI will be among us to challenge our notion of human intelligence and make us wonder if we should start polishing our robot-dance moves to keep up and maintain them entertained. submitted by /u/Powerful-Pumpkin-938 [link] [comments]  ( 8 min )
    Using google account data for AI personalization
    Brand new to AI and trying to understand possible functionalities, so bear with me if i sound uneducated. i know that you can download all of your google account data, (which i think is probably the largest amount of personalized information that most of us have easy access to. includes youtube history, google pay transactions, browsing history, tons of other info.) but my question is; after downloading this data, how could you implement this into some localized ai program to improve your life? it knows what restaurants you frequent, items you buy, travel patterns, vacations, email history, ect i would love to hear peoples ideas on this, thanks. submitted by /u/G-Boogie42 [link] [comments]  ( 8 min )
    Any methods for achieving close to elevenlabs quality and inference speed with a local model?
    My upcoming master thesis (on use of AI models for believable NPCs in video games) will involve the use of text to speech. Currently elevenlabs has been my go to for TTS, but the pricing model is quite inconvenient since its a monthly subscription instead of pay for use. I'm sure some of you here are knowledgeable with TTS, it would be great if anyone could point me in a good direction. I want use use TTS to generate natural human sounding conversational voices. Ideally I could finetune the model on different voice profiles to get a wide range of voices. I have only started to look I to options outside elevenlabs, but I figured I should ask here before I start diving deep, so I can avoid unnecessary waste of time! If this is not achievable locally, it would be great to know if there are any methods for hosting a model on a cloud compute platform, so I can at least pay for use, instead of deal with the monthly subscription that comes with eleven labs. Thank you! submitted by /u/timidavid350 [link] [comments]  ( 9 min )
    ai generated video compilations
    was wondering if there's an AI video tool where you put in keywords and it creates a compilation of gifs/clips based on your text. for example, if you typed in "dog", "car" and "clouds" it would create a video compiled of short clips of dogs, cars and clouds. sorry if this is a dumb question i was just curious if this exists haha thanks! submitted by /u/bonlom [link] [comments]  ( 8 min )
  • Open

    How would one normalize observations in off-policy online reinforcement learning?
    Can someone please help with this question . Please let me know if you'd like me to paste the whole question in the description here. Thank you! submitted by /u/Academic-Rent7800 [link] [comments]  ( 8 min )
    Tuning Rewards/Adding More Rewards in RL is a headache
    I am working on RL and it is really exciting. But in a lot of cases I don't get the desired behaviours i want. So i either tune the weights of existing rewards or adding more reward terms trying to fix that, but later I realized that adding more rewards just make the problem more complex and I even get worse performance than before. I also tried hyperparameter tuning, but they are really slow and I don't see too much difference compared with my heuristic choice. I read some RL papers and I never figure out where these magic weights come from. Anyone can provide some advice will be very helpful :) submitted by /u/Alchemist1990 [link] [comments]  ( 8 min )
    Causes of RL agent reaching a more optimal policy but not continuing to improve
    Hey, I'm developing RL agent for optimal setpoint control but I'm experiencing some weird behaviour where the agent would reach a policy where it will perform better than all other previous policies, but doesn't continue improving and goes back to get stuck at a local minimum. I'm assuming this behaviour comes from wrong hyperparameter tuning, but I was wondering if there could be different sources for this behaviour. ​ Training (lower score is more optimal) This is the picture from training. I'm training with PPO and had non-zero value for entropy-coefficient. Edit: When I stop training at the dips, and test the agent, it will quite consistently score better. submitted by /u/LeSUTHU [link] [comments]  ( 8 min )
    "Solving math word problems with process- and outcome-based feedback", Uesato et al 2022 {DM}
    submitted by /u/gwern [link] [comments]  ( 8 min )
    Extensions for SAC
    I am a starter in Reinforcement learning and stumbeled across SAC. While all other off-policy algorithm seem to have extensions (DQN,DDQN/DDPG,TD3) I am wondering what are extensions for SAC that are worth having a look at? I already found 2 papers (DR3 and TQC) but im not experienced enough to evaluate them. So i thought about building them and comparing them to others. Would be nice to hear someones opinion:) submitted by /u/MChiefMC [link] [comments]  ( 8 min )
    PPO agent for "2048": help requested
    Hi folks, I have started to dip my toes in reinforcement learning recently, reproducing some basic algorithms (DQN, REINFORCE, DDPG) with the easy gymnasium examples with success. I have now started with a new project with a custom environment: 2048. It seems to be an interesting target for a deep reinforcement learning agent: the action space is small (at most four directions) but with a large observation space to generalize over using (a) neural network(s). Here's where the problem starts: after implementing a custom environment that follows the typical gymnasium interface, and use a slightly adjusted PPO implementation from CleanRL, I cannot get the agent to learn anything at all, even though this specific implementation seems to work just fine on basic gymnasium examples. I am hoping…  ( 10 min )
    How to work with a large action space?
    I want to train a single agent for a game using reinforcement learning. I am totally new in this field. In the game, 4 characters can be moved around on a 10x10 board. After the character is moved to a new position, it can build something on one of the grid cells in a certain range from its new position. Each character has a different range of tiles, where they can move and where they can build a house. So, in general, the action space is 4x100x100 = 40000 including all the illegal moves, which is huge. Can you advise any way to shrink the action space? I am using Gymnasium to define the environment. Previously, I thought about using MultiDiscrete or Tuple spaces for action space, e.g. one dimension for characters, one dimension for 100 tiles to move, and one dimension for 100 tiles to build the house. However, how can I then mask the illegal moves? submitted by /u/naeson [link] [comments]  ( 9 min )
  • Open

    [D] NVIDIA Rapid Vs Pandas
    Hello ML Team, I am reaching out on behalf of Data Care LLC. We developed a large Kubernetes-based cluster with thousands of A4000 GPUs and are currently conducting a benchmarking experiment to compare NVIDIA RAPIDS against Pandas for various size workloads that needs a bit of scrubbing. Here are the details: Infra1: Pandas with 6 CPU cores, 16 GB RAM Infra2: A4000 with 16 GB VRAM, 16 GM RAM and 6 CPU Cores. Synthetic dataset of .8 GB, 1.6Gb, 3.2 GB, 6.4 GB sizes with errors with data that need to be scrubbed. You are welcome to define your own dataset. If any of you are interested in participating in this experiment, we would be happy to provide you with the necessary compute resources. To sign up for this experiment, please visit www.thedatacare.com/experiment-rapid We greatly appreciate your participation and encourage you to publish your results to contribute to the community. Thank you, Arun Data Care LLC submitted by /u/arunpalepu1981 [link] [comments]  ( 9 min )
    [DISCUSSION] How much can we trust OpenAI (and other large AI companies) will keep our fine tuned models and data sets private?
    tldr: Do you trust OpenAI or other large AI companies with your data? Do you reckon it's just a matter of time before they find all of the data, so might as well contribute to their research project and benefit from it while you can? Or do you prefer to go the open sourced route instead for this reason? Here's is my concern: Some of my team members are very high on Open AI's models, their ease of use, and how smart they are out the box. Now, Open AI (relatively recently) published a statement saying that they will not use your fine tuned models or data sets internally to improve their products, but given their history and the value of these fine tuned models and their corresponding datasets, I'm uncertain to what extent we are able to trust that they will keep our data private. I like t…  ( 9 min )
    Just was Offered a $130k Machine Learning Engineer Position - Is this Salary too Low? [D]
    I am about to graduate with my master's and I was offered a Machine Learning Engineer role with a salary of $130k. I have data science experience, but not much programming experience. I pushed back on the salary and said I was expecting a little more (I was paid more as a Data Scientist), and the recruiter said the team was taking a chance on me and I need to prove myself before they would raise my salary. I also want to mention I am switching into work that is more coding heavy than what I was doing as a Data Scientist. The job is exactly what I want to do though and the team was really great to talk to. The company has great reviews online and wonderful benefits. I am leaning towards just taking the job but this is my first job interview and I don't want to sell myself short. FYI I live in California, not the Bay Area. submitted by /u/datatastic08200 [link] [comments]  ( 9 min )
    [D] What have been your use cases for LLM autonomous agents?
    I've been using GPT for completions on a daily basis for a while now - code completion and search-like chatting, basically. I've recently been playing around with both ChatGPT plugins and LangChain for autonomous-agent-like behavior, and although the idea of the LLM interacting with the environment through API calls or code interpretation seems promising, in practice I haven't found such a useful and usable case for it like completions yet. LangChain's OpenAPI toolkit with its planner/controller agent duo seems to get lost 90% of the time, making it unusable. This happens even with an /api endpoint telling it exactly how to interact with the API and prompt templates suggesting that this endpoint be used to get the API specs. Maybe I'm just not getting it right... As for ChatGPT plugins, other than web search for more updated results I haven't really found a use case where I could not do the same thing with completions. Code Interpreter shaves off a few seconds vs completions and running whatever script it produces locally, but it's not very useful in face of compliance or privacy requirements of not uploading stuff into OpenAI. For example I wanted to speed up a work related video and add a separate audio track to it. I couldn't upload the video to OpenAI as it contained internal work stuff, so I just used completions for an ffmpeg script to do the job and ran it locally. Same thing with transforming or plotting CSV data - can't really update customer data to OpenAI, so just get the script and run it locally. Anyway, I can think of a lot of cool use cases for autonomous agents and the like, but I haven't been able to actually use it in my daily routine, unlike text completion. Have you been using autonomous agents successfully and regularly? submitted by /u/osantacruz [link] [comments]  ( 9 min )
    [D] When will the LLM winter come?
    Everyone is jumping on the bandwagon of LLMs, and it seems like we should follow suit. But why? LLMs are impressive; they can play chess, emulate a Linux terminal, perform mathematical calculations, convert code, and more. However, in most cases, they perform these tasks poorly, which means they cannot be used as is, at least not for now. While we come across numerous cool demos, the results are often cherry-picked and not applicable to real business use cases. I can only identify a few limited use cases where LLMs truly shine: QA, summarization, and acting as a co-pilot for tasks such as coding, writing, and education etc. In essence, they excel in seq2seq tasks. There are some notable drawbacks that should be acknowledged: Cost: The cost of using LLMs will never be cheaper than specialized solutions. Latency: Building real-time applications with LLMs is extremely challenging. Quality and Accuracy: For now, LLMs lack strong reasoning capabilities. Reliability: Hallucinations are still an issue. Occasionally, the model fails to follow instructions. Quotas: Currently, the quotas imposed on LLMs are too low for large-scale production applications. It is possible that most of these problems will be resolved in the future. However, the question remains: how long will it take? Will it be 1 year, 5 years, or even 10 years? Unfortunately, we need to see a ROI this year. Anyway, just want to remind you that there is no free lunch! submitted by /u/___mlm___ [link] [comments]  ( 9 min )
    [D] Regression Problem
    I’m trying to develop a ML model to be able to predict “time series” data. It’s not actually time series, it’s temperature series. But regardless, the order matters. I have 20 sets of tabular data in separate csv’s that each contain 3 columns. Column 1 is temperature from 800 to 1360, column 2 are values I’d like to be able to predict, and column 3 contains values that I’d like to use as input to predict column 2 values at each respective temperature. Because the temperature range is between 800 and 1360, each csv has 560 rows of data (1360 - 800 = 560). So in total, there’s 11,200 rows of data (20 sets * 560 = 11,200 total rows). The data in each column, excluding the temperature column, are second order polynomials. So I’m essentially trying to get from one polynomial (column 3) to another (column 2). What would be the best ML method to achieve this given my problem? I know there’s an underlying pattern but I need ML to help me determine it. Appreciate any help here, thanks!! submitted by /u/cgifted7 [link] [comments]  ( 9 min )
    Seeking Participants for AI-related Survey[R]
    I am currently working on my IB Extended Essay, and I would greatly appreciate your help in gathering valuable insights from individuals knowledgeable in the field of AI. The purpose of my survey is to understand the perspectives of AI enthusiasts and professionals like you. If you have a few minutes to spare, I kindly request you to participate in my survey. Your input will contribute significantly to my research and help me gain a deeper understanding of the topic. The survey covers various aspects of AI, and your expertise will be invaluable in shaping the results. Survey Link: https://forms.gle/PVGrRbPLTpZRbbpL9 Rest assured that all responses will be kept confidential and only used for academic purposes. Additionally, feel free to share this survey with others who might be interested or knowledgeable in the field. Thank you in advance for your time and contributions! Your participation will greatly aid in the successful completion of my IB Extended Essay. submitted by /u/KVNG_Winston [link] [comments]  ( 9 min )
    [R] All about evaluating Large language models
    I explored my curiosity on how to best evaluate LLMs and LLM application and consolidated my thoughts in this article https://explodinggradients.com/all-about-evaluating-large-language-models https://preview.redd.it/btfz3loxb6bb1.png?width=1920&format=png&auto=webp&s=94d1789c0e0f8c1dac50864996d908497a45a241 submitted by /u/iamikka [link] [comments]  ( 8 min )
    [D] Hacking LangChain for Fun and Profit
    https://blog.kevinhu.me/2023/07/10/hacking-langchain-for-fun-and-profit/ I'm starting a series of blogs to delve into LangChain. Hope this helps anyone who's interested in LLM and building with LangChain. submitted by /u/OLDGUN [link] [comments]  ( 8 min )
    [D] Forcing diversification in similarity search
    I’m using a vector database for storing image embeddings and using it for similarity search. If I pick top ten most similar vectors I can sometimes end up inside of an echo chamber with almost “duplicates” or too similar images. I would like to diversify the results so that all the results are close to the input vector but different between themselves. Are there common patterns/algorithms for this type of diversification? The idea that I have: I want to pick 10 images. I would query the database for a 100 of the most similar embeddings and use a clustering algorithm to cluster them into 10 clusters. Finally, I would pick one image from each cluster. submitted by /u/alkibijad [link] [comments]  ( 8 min )
    [D] Website to get historical price for agriculture commodities?
    Hey guys, i want to make price prediction for agriculture commodities such as grain, corn and coffee, I need Historical price data like open, low, high, close, and volume. You guys know which website that provides those data for free??? I tried https://markets.businessinsider.com/ but it was no help submitted by /u/Classic-Fee6889 [link] [comments]  ( 8 min )
    [P] We built a paper sharing platform. Looking for alpha testers!
    https://www.entepond.com We are a group of industry practitioners who got tired with sharing papers via social media platforms. They are not designed for that. It’s hard enough to keep track of the trending/latest papers nowadays. So, we built Pond, a paper sharing platform that makes sense. submitted by /u/dockerun [link] [comments]  ( 8 min )
    [D][R] Help with bachelor thesis
    Hi all, I am finishing my 3rd year UG AI program, and I’ve already been admitted to a MSc in AI. However, I need some ideas for my thesis which should be around 30 pages long. I’ve been doing some research on what I would like to study, but I still havent found anything I like. I am open to both an applied ML topic and a more theoretical one, so feel free to suggest something you think is appropriate 😄 submitted by /u/AskAmbitious5697 [link] [comments]  ( 8 min )
    [D] MPT-30b or Falcon 40b: which one is the better option
    In the open-source community, two models that have gained a lot of popularity is MPT and Falcon. However, I wonder which one is the best LLM. What are your considerations about both models? submitted by /u/paulo_zip [link] [comments]  ( 8 min )
    [Project] Brute force for popular models, Feature Reduction and Scaling Techniques for Classification [Python]
    This is my first article to try something new, it is brute forcing on classifications problems Does anybody have suggestions to improve the code, add more features, or suggest another idea https://baroodz.medium.com/brute-force-for-popular-models-feature-reduction-and-scaling-techniques-for-classification-69be9c0426b9 https://github.com/Barood-cmd/BruteforceML/ All opinions appreciated submitted by /u/Barood_D [link] [comments]  ( 8 min )
    [D] How does RTX 3060 12GB fare against the RTX 4060 Ti 16GB for
    Ok, now that the RTX 4060 Ti 16GB has been launched this question needs to be asked :) RTX 4060 Ti 16GB is faster has more/latest tensor cores but has less memory bandwidth 128bit 288gbps while the RTX 3060 12GB is slower (also cheaper) but has more memory bandwidth 192 bit 360gbps. less memory though. Does anyone has experience with the new card for machine learning purpose? ​ Specs: RTX 4060 Ti 16GB - https://www.techpowerup.com/gpu-specs/geforce-rtx-4060-ti-16-gb.c4155 RTX 3060 12GB - https://www.techpowerup.com/gpu-specs/geforce-rtx-3060-12-gb-ga104.c3832 ​ submitted by /u/PhotonAttack [link] [comments]  ( 8 min )
    [D] - Issues with Model Collapse in VAE for Reconstruction Task
    I am currently experiencing a suspected model collapse in a Variational Autoencoder (VAE) model I am working with. Below are details on the project setup and the issue at hand: Project Goal: Exploring if a the same VAE architecture can independently handle two tasks - reconstruction and regression - using real-world production line data. Data: Dataset: Approximately 80,000 samples Features: Sensor readings such as flow, temperature, force, etc. (both binary and continuous data) Target for regression: The measurement of a product property at the end of the line (can be divided in approximately 50 classes) Preprocessing: Data normalized between 0 and 1 before training Model Architecture: Encoder: Input layer of 800 and a hidden layer of 600 Latent Space: Size of 400 Decoder: …  ( 10 min )
    [P] Image based financial statements ratio analysis like quick ratio, debt to equity ratio?
    I have image balance sheets and using aws ocr we extract text then format and segment them to find ratios like quick ratio, current ratio, debt to equity ratio for analysis using fuzzy string matching. However, sometimes the values for calculation of these ratio are present but with different keywords, values are there but the total is not present. Is there any literature or code base utilizing ML approach to do ratio analysis on financial statements submitted by /u/Priyanshu1_ [link] [comments]  ( 8 min )
    [D] Data Management for Machine Learning Projects
    Hey, r/MachineLearning, it’s Nir from DagsHub 🐶 Data management may seem straightforward, but beneath the surface, it poses complex hurdles that can impact project success. We've been thinking a lot about it lately, particularly within machine learning projects. So we decided to conduct research, talking with companies and individual practitioners about the challenges they face. We wanted to shed light on the often-overlooked challenges that come with this crucial aspect. If you're interested in understanding the intricacies of data management and how it influences ML projects, I invite you to check out our latest blog post where we share our initial results. It's an informative read that aims to foster knowledge-sharing within our community. This is one area we can all learn a lot from each other. Feel free to share your own experiences and insights related to data management challenges and how you overcome them. Here's the link to the blog post: https://dagshub.com/blog/challenges-in-data-management-for-machine-learning/ As always, looking forward to hearing your thoughts! 🤗 submitted by /u/RepresentativeCod613 [link] [comments]  ( 9 min )
    [D] Choosing the right embeddings model
    I'm trying to pick the right model to take short stories and pick similar short stories based on the story. I was looking through https://huggingface.co/spaces/mteb/leaderboard and trying to figure out what model I should use. Should I be looking at the top clustering models? I was going to use chatgpt ada002 but would rather use a better model if possible submitted by /u/juniorskimbrough [link] [comments]  ( 8 min )
    [R] Adversarial Robust Deep Reinforcement Learning Requires Redefining Robustness
    https://ojs.aaai.org/index.php/AAAI/article/view/26009/25781 submitted by /u/ml_dnn [link] [comments]  ( 8 min )
    For deep learning practitioners in industry, is the workflow always this annoying? [D]
    Right now I’m working on a multivariate time series classification problem for a lab I’m working at. The training set is a multivariate time series with data collected over minutely level. Each time series is a type of positional tracking measurement. The goal is to classify each of the time series and see if they can discriminate between 3 different types of outcomes of interest. The labels are 0,1,2. I’m doing this in PyTorch, and I read roughly 8-9 papers. They all seem to point to convolutional neural networks as a starting point. After building the model and setting up the training loops, this whole process feels like maybe 1% model building and 99% debugging, tuning, architecture changing, and tensor shape checking. My day to day has just been stepping through my architecture with …  ( 9 min )
    [D] LLM Multi-Rounds Fine-Tuning
    I have 2 datasets that I want to use for fine-tuning multiple LLMs. Dataset #1 is large and has "raw" text data (many scraped webpages revolving around different topics that an LLM may not have been pre-trained on). Dataset #2 is very small and has selected examples depicting how a topic from Dataset #1 should be presented to the user during chat interactions. I am trying to determine a good strategy for fine-tuning. I could not find any examples of "two-round" fine tuning in a scenario like this. I am thinking about using quantization and training just the last few layers of selected LLM on Dataset #1, and using the LoRA method for Dataset #2. However, I do not have much compute resources. Any tips/suggestions/advice would be greatly appreciated! submitted by /u/ilovejoi36912 [link] [comments]  ( 8 min )
  • Open

    Visualizing a determinant identity
    The previous post discussed an algorithm developed by Charles Dodgson (better known as Lewis Carroll) for computing determinants. The key identity for proving that Dodgson’s algorithm is correct involves the Desnanot-Jacobi identity from 1841. The identity is intimidating in its symbolic form and yet easy to visualize. In algebraic form the identity says Here a […] Visualizing a determinant identity first appeared on John D. Cook.  ( 5 min )
    How Lewis Carroll computed determinants
    Charles Dodgson, better known by his pen name Lewis Carroll, discovered a method of calculating determinants now known variously as the method of contractants, Dodgson condensation, or simply condensation. The method was devised for ease of computation by hand, but it has features that make it a practical method for computation by machine. Overview The […] How Lewis Carroll computed determinants first appeared on John D. Cook.  ( 6 min )
    Gram matrix
    An elegant algebraic identity says If x is the vector [a b] and y is the vector [c d] then this identity can be written where the dot indicates the usual dot product. I posted this on Twitter the other day. Gram matrix Now suppose that x and y are vectors of any length n. […] Gram matrix first appeared on John D. Cook.  ( 5 min )
  • Open

    3 Questions: Honing robot perception and mapping
    Luca Carlone and Jonathan How of MIT LIDS discuss how future robots might perceive and interact with their environment.  ( 8 min )
  • Open

    Design Speed Takes the Lead: Trek Bicycle Competes in Tour de France With Bikes Developed Using NVIDIA GPUs
    NVIDIA RTX is spinning new cycles for designs. Trek Bicycle is using GPUs to bring design concepts to life. The Wisconsin-based company, one of the largest bicycle manufacturers in the world, aims to create bikes with the highest-quality craftsmanship. With its new partner Lidl, an international retailer chain, Trek Bicycle also owns a cycling team, Read article >  ( 7 min )
  • Open

    Renovating computer systems securely and progressively with APRON
    This research paper was accepted by 2023 USENIX Annual Technical Conference (ATC), which is dedicated to advancing the field of systems research. Whether they’re personal computers or cloud instances, it’s crucial to ensure that the computer systems people use every day are reliable and secure. The validity of these systems is critical because if storage […] The post Renovating computer systems securely and progressively with APRON appeared first on Microsoft Research.  ( 10 min )
  • Open

    AIOps above the radar – Using AI to monitor your AI infrastructure
    When an enterprise project is low-profile (“below the radar”), then it is not likely to be the target of bad actors. Similarly, if some part of that project’s infrastructure fails or falters, then the consequences of the problem and/or the urgency of providing a solution are usually manageable. But when a high-profile (“above the radar”)… Read More »AIOps above the radar – Using AI to monitor your AI infrastructure The post AIOps above the radar – Using AI to monitor your AI infrastructure appeared first on Data Science Central.  ( 22 min )
    Security data lakes and the future of organizational security
    Evolving technological advancements have created a far more data-centric world. This has dramatically changed the enterprise landscape, while also creating more data silos. The explosion of cybersecurity tools and mounds of data in modern enterprises have made it difficult to combine data to create a unified view. This has resulted in siloed data that’s also… Read More »Security data lakes and the future of organizational security The post Security data lakes and the future of organizational security appeared first on Data Science Central.  ( 21 min )
    Sentience: AI has demystified human consciousness, intelligence
    There is a recent article, Unraveling the Mystery of Human Consciousness, where it was stated that, “Consciousness makes us capable of experiencing the scent of a rose, the touch of a breeze, the taste of food, the sound of music, and the sight of a sunrise. We also have a unique ability to be aware… Read More »Sentience: AI has demystified human consciousness, intelligence The post Sentience: AI has demystified human consciousness, intelligence appeared first on Data Science Central.  ( 19 min )
    Exploring intelligent search solutions: A comparative analysis of Amazon Kendra integration and large language model crawlers
    Amazon Kendra and LLamaIndex can help with knowledge integration but fall short in connecting diverse knowledge sources, to enable efficient intelligent search. In this article, we compare the existing solutions and explain how to overcome their limitations using a Google Drive crawler. Companies often face difficulties in consolidating their knowledge base when their data is… Read More »Exploring intelligent search solutions: A comparative analysis of Amazon Kendra integration and large language model crawlers The post Exploring intelligent search solutions: A comparative analysis of Amazon Kendra integration and large language model crawlers appeared first on Data Science Central.  ( 27 min )
    What’s missing from ChatGPT and other LLMs?
    Recent developments in artificial intelligence remind me of the automotive industry in the late 19th and early 20th century. In that case, it took the industry several decades to commit to internal combustion engines. And while that picture was still unclear, there were over 250 different car manufacturers, some of whom were producing steam-powered cars.… Read More »What’s missing from ChatGPT and other LLMs? The post What’s missing from ChatGPT and other LLMs? appeared first on Data Science Central.  ( 21 min )
  • Open

    Google at ACL 2023
    Posted by Malaya Jules, Program Manager, Google This week, the 61st annual meeting of the Association for Computational Linguistics (ACL), a premier conference covering a broad spectrum of research areas that are concerned with computational approaches to natural language, is taking place online. As a leader in natural language processing and understanding, and a Diamond Level sponsor of ACL 2023, Google will showcase the latest research in the field with over 50 publications, and active involvement in a variety of workshops and tutorials. bold). Board and Organizing Committee Dan Garrette Workshop chairs include: Annie Louis Publication chairs include: Lei Shu Program Committee includes: Vinodkumar Prabhakaran, Najoung Kim, Markus Freitag Spotlight papers NusaCrowd: O…  ( 93 min )
  • Open

    A lighting-fast and developer-friendly Federated Learning SDK
    Hello Reddit 👋 I am contacting you because I am one of the community relations coordinators of MetisFL, a federated learning framework. https://github.com/NevronAI/metisfl#readme I would really love and appreciate it if you checked out and star our repo and considered using the framework or contributing to it 🤗 The code is the result of several years of research on federated learning and was just released a few weeks ago. So you would be one of the first outside users/contributors! We are currently preparing for the first stable release of our pip package and any community input would be much appreciated! Please do not hesitate to reach out to us anytime, we are always available on our Slack channel to answer questions about MetisFL, solve technical issues or chat about the future of tech and AI! Slack is pretty new as well, you'd be one of the first outside members! Hope to see you in our online community! submitted by /u/No-Literature-1930 [link] [comments]  ( 8 min )
    Transformers
    Is anyone know where can I get more detailed Explanation for better Understanding of RNN,LSTM, Transformers,BERT and Practical Implementation of it too submitted by /u/Potential_Plant_160 [link] [comments]  ( 8 min )
    Fast SAM (Segment Anything Model) review
    ​ https://preview.redd.it/gvfxnbpfx2bb1.jpg?width=871&format=pjpg&auto=webp&s=63778cd329dd8ba56dc77e9b67947b334f4eeb81 You can find it interesting. A new post from the OpenCV.ai team about Fast SAM (Segment Anything Model) Short description: Fast SAM, an innovative approach to the Segment Anything Task, drastically increases SAM model speed by 50 times. It introduces prompts to facilitate a broad range of problem-solving, breaking barriers of traditional supervised learning. This blog post will briefly explain the segment anything task and the Fast SAM approach. More details are here submitted by /u/No-Independence5880 [link] [comments]  ( 8 min )
  • Open

    On the Stepwise Nature of Self-Supervised Learning
    Figure 1: stepwise behavior in self-supervised learning. When training common SSL algorithms, we find that the loss descends in a stepwise fashion (top left) and the learned embeddings iteratively increase in dimensionality (bottom left). Direct visualization of embeddings (right; top three PCA directions shown) confirms that embeddings are initially collapsed to a point, which then expands to a 1D manifold, a 2D manifold, and beyond concurrently with steps in the loss. It is widely believed that deep learning’s stunning success is due in part to its ability to discover and extract useful representations of complex data. Self-supervised learning (SSL) has emerged as a leading framework for learning these representations for images directly from unlabeled data, similar to how LLMs learn re…  ( 5 min )
  • Open

    GEANN: Scalable Graph Augmentations for Multi-Horizon Time Series Forecasting. (arXiv:2307.03595v1 [cs.LG])
    Encoder-decoder deep neural networks have been increasingly studied for multi-horizon time series forecasting, especially in real-world applications. However, to forecast accurately, these sophisticated models typically rely on a large number of time series examples with substantial history. A rapidly growing topic of interest is forecasting time series which lack sufficient historical data -- often referred to as the ``cold start'' problem. In this paper, we introduce a novel yet simple method to address this problem by leveraging graph neural networks (GNNs) as a data augmentation for enhancing the encoder used by such forecasters. These GNN-based features can capture complex inter-series relationships, and their generation process can be optimized end-to-end with the forecasting task. We show that our architecture can use either data-driven or domain knowledge-defined graphs, scaling to incorporate information from multiple very large graphs with millions of nodes. In our target application of demand forecasting for a large e-commerce retailer, we demonstrate on both a small dataset of 100K products and a large dataset with over 2 million products that our method improves overall performance over competitive baseline models. More importantly, we show that it brings substantially more gains to ``cold start'' products such as those newly launched or recently out-of-stock.  ( 2 min )
    ContextLabeler Dataset: physical and virtual sensors data collected from smartphone usage in-the-wild. (arXiv:2307.03586v1 [cs.HC])
    This paper describes a data collection campaign and the resulting dataset derived from smartphone sensors characterizing the daily life activities of 3 volunteers in a period of two weeks. The dataset is released as a collection of CSV files containing more than 45K data samples, where each sample is composed by 1332 features related to a heterogeneous set of physical and virtual sensors, including motion sensors, running applications, devices in proximity, and weather conditions. Moreover, each data sample is associated with a ground truth label that describes the user activity and the situation in which she was involved during the sensing experiment (e.g., working, at restaurant, and doing sport activity). To avoid introducing any bias during the data collection, we performed the sensing experiment in-the-wild, that is, by using the volunteers' devices, and without defining any constraint related to the user's behavior. For this reason, the collected dataset represents a useful source of real data to both define and evaluate a broad set of novel context-aware solutions (both algorithms and protocols) that aim to adapt their behavior according to the changes in the user's situation in a mobile environment.  ( 2 min )
    Equivariant Single View Pose Prediction Via Induced and Restricted Representations. (arXiv:2307.03704v1 [cs.CV])
    Learning about the three-dimensional world from two-dimensional images is a fundamental problem in computer vision. An ideal neural network architecture for such tasks would leverage the fact that objects can be rotated and translated in three dimensions to make predictions about novel images. However, imposing SO(3)-equivariance on two-dimensional inputs is difficult because the group of three-dimensional rotations does not have a natural action on the two-dimensional plane. Specifically, it is possible that an element of SO(3) will rotate an image out of plane. We show that an algorithm that learns a three-dimensional representation of the world from two dimensional images must satisfy certain geometric consistency properties which we formulate as SO(2)-equivariance constraints. We use the induced and restricted representations of SO(2) on SO(3) to construct and classify architectures which satisfy these geometric consistency constraints. We prove that any architecture which respects said consistency constraints can be realized as an instance of our construction. We show that three previously proposed neural architectures for 3D pose prediction are special cases of our construction. We propose a new algorithm that is a learnable generalization of previously considered methods. We test our architecture on three pose predictions task and achieve SOTA results on both the PASCAL3D+ and SYMSOL pose estimation tasks.
    Distilling Universal and Joint Knowledge for Cross-Domain Model Compression on Time Series Data. (arXiv:2307.03347v1 [cs.LG])
    For many real-world time series tasks, the computational complexity of prevalent deep leaning models often hinders the deployment on resource-limited environments (e.g., smartphones). Moreover, due to the inevitable domain shift between model training (source) and deploying (target) stages, compressing those deep models under cross-domain scenarios becomes more challenging. Although some of existing works have already explored cross-domain knowledge distillation for model compression, they are either biased to source data or heavily tangled between source and target data. To this end, we design a novel end-to-end framework called Universal and joint knowledge distillation (UNI-KD) for cross-domain model compression. In particular, we propose to transfer both the universal feature-level knowledge across source and target domains and the joint logit-level knowledge shared by both domains from the teacher to the student model via an adversarial learning scheme. More specifically, a feature-domain discriminator is employed to align teacher's and student's representations for universal knowledge transfer. A data-domain discriminator is utilized to prioritize the domain-shared samples for joint knowledge transfer. Extensive experimental results on four time series datasets demonstrate the superiority of our proposed method over state-of-the-art (SOTA) benchmarks.  ( 2 min )
    Auxiliary Functions as Koopman Observables: Data-Driven Polynomial Optimization for Dynamical Systems. (arXiv:2303.01483v2 [math.DS] UPDATED)
    We present a flexible data-driven method for dynamical system analysis that does not require explicit model discovery. The method is rooted in well-established techniques for approximating the Koopman operator from data and is implemented as a semidefinite program that can be solved numerically. Furthermore, the method is agnostic of whether data is generated through a deterministic or stochastic process, so its implementation requires no prior adjustments by the user to accommodate these different scenarios. Rigorous convergence results justify the applicability of the method, while also extending and uniting similar results from across the literature. Examples on discovering Lyapunov functions, performing ergodic optimization, and bounding extrema over attractors for both deterministic and stochastic dynamics exemplify these convergence results and demonstrate the performance of the method.
    One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention. (arXiv:2307.03576v1 [cs.LG])
    Recent works have empirically analyzed in-context learning and shown that transformers trained on synthetic linear regression tasks can learn to implement ridge regression, which is the Bayes-optimal predictor, given sufficient capacity [Aky\"urek et al., 2023], while one-layer transformers with linear self-attention and no MLP layer will learn to implement one step of gradient descent (GD) on a least-squares linear regression objective [von Oswald et al., 2022]. However, the theory behind these observations remains poorly understood. We theoretically study transformers with a single layer of linear self-attention, trained on synthetic noisy linear regression data. First, we mathematically show that when the covariates are drawn from a standard Gaussian distribution, the one-layer transformer which minimizes the pre-training loss will implement a single step of GD on the least-squares linear regression objective. Then, we find that changing the distribution of the covariates and weight vector to a non-isotropic Gaussian distribution has a strong impact on the learned algorithm: the global minimizer of the pre-training loss now implements a single step of $\textit{pre-conditioned}$ GD. However, if only the distribution of the responses is changed, then this does not have a large effect on the learned algorithm: even when the response comes from a more general family of $\textit{nonlinear}$ functions, the global minimizer of the pre-training loss still implements a single step of GD on a least-squares linear regression objective.
    When does the ID algorithm fail?. (arXiv:2307.03750v1 [stat.ME])
    The ID algorithm solves the problem of identification of interventional distributions of the form p(Y | do(a)) in graphical causal models, and has been formulated in a number of ways [12, 9, 6]. The ID algorithm is sound (outputs the correct functional of the observed data distribution whenever p(Y | do(a)) is identified in the causal model represented by the input graph), and complete (explicitly flags as a failure any input p(Y | do(a)) whenever this distribution is not identified in the causal model represented by the input graph). The reference [9] provides a result, the so called "hedge criterion" (Corollary 3), which aims to give a graphical characterization of situations when the ID algorithm fails to identify its input in terms of a structure in the input graph called the hedge. While the ID algorithm is, indeed, a sound and complete algorithm, and the hedge structure does arise whenever the input distribution is not identified, Corollary 3 presented in [9] is incorrect as stated. In this note, I outline the modern presentation of the ID algorithm, discuss a simple counterexample to Corollary 3, and provide a number of graphical characterizations of the ID algorithm failing to identify its input distribution.
    BOF-UCB: A Bayesian-Optimistic Frequentist Algorithm for Non-Stationary Contextual Bandits. (arXiv:2307.03587v1 [cs.LG])
    We propose a novel Bayesian-Optimistic Frequentist Upper Confidence Bound (BOF-UCB) algorithm for stochastic contextual linear bandits in non-stationary environments. This unique combination of Bayesian and frequentist principles enhances adaptability and performance in dynamic settings. The BOF-UCB algorithm utilizes sequential Bayesian updates to infer the posterior distribution of the unknown regression parameter, and subsequently employs a frequentist approach to compute the Upper Confidence Bound (UCB) by maximizing the expected reward over the posterior distribution. We provide theoretical guarantees of BOF-UCB's performance and demonstrate its effectiveness in balancing exploration and exploitation on synthetic datasets and classical control tasks in a reinforcement learning setting. Our results show that BOF-UCB outperforms existing methods, making it a promising solution for sequential decision-making in non-stationary environments.
    When Fair Classification Meets Noisy Protected Attributes. (arXiv:2307.03306v1 [cs.LG])
    The operationalization of algorithmic fairness comes with several practical challenges, not the least of which is the availability or reliability of protected attributes in datasets. In real-world contexts, practical and legal impediments may prevent the collection and use of demographic data, making it difficult to ensure algorithmic fairness. While initial fairness algorithms did not consider these limitations, recent proposals aim to achieve algorithmic fairness in classification by incorporating noisiness in protected attributes or not using protected attributes at all. To the best of our knowledge, this is the first head-to-head study of fair classification algorithms to compare attribute-reliant, noise-tolerant and attribute-blind algorithms along the dual axes of predictivity and fairness. We evaluated these algorithms via case studies on four real-world datasets and synthetic perturbations. Our study reveals that attribute-blind and noise-tolerant fair classifiers can potentially achieve similar level of performance as attribute-reliant algorithms, even when protected attributes are noisy. However, implementing them in practice requires careful nuance. Our study provides insights into the practical implications of using fair classification algorithms in scenarios where protected attributes are noisy or partially available.
    Federated Unlearning via Active Forgetting. (arXiv:2307.03363v1 [cs.LG])
    The increasing concerns regarding the privacy of machine learning models have catalyzed the exploration of machine unlearning, i.e., a process that removes the influence of training data on machine learning models. This concern also arises in the realm of federated learning, prompting researchers to address the federated unlearning problem. However, federated unlearning remains challenging. Existing unlearning methods can be broadly categorized into two approaches, i.e., exact unlearning and approximate unlearning. Firstly, implementing exact unlearning, which typically relies on the partition-aggregation framework, in a distributed manner does not improve time efficiency theoretically. Secondly, existing federated (approximate) unlearning methods suffer from imprecise data influence estimation, significant computational burden, or both. To this end, we propose a novel federated unlearning framework based on incremental learning, which is independent of specific models and federated settings. Our framework differs from existing federated unlearning methods that rely on approximate retraining or data influence estimation. Instead, we leverage new memories to overwrite old ones, imitating the process of \textit{active forgetting} in neurology. Specifically, the model, intended to unlearn, serves as a student model that continuously learns from randomly initiated teacher models. To preserve catastrophic forgetting of non-target data, we utilize elastic weight consolidation to elastically constrain weight change. Extensive experiments on three benchmark datasets demonstrate the efficiency and effectiveness of our proposed method. The result of backdoor attacks demonstrates that our proposed method achieves satisfying completeness.
    STG-MTL: Scalable Task Grouping for Multi-Task Learning Using Data Map. (arXiv:2307.03374v1 [cs.LG])
    Multi-Task Learning (MTL) is a powerful technique that has gained popularity due to its performance improvement over traditional Single-Task Learning (STL). However, MTL is often challenging because there is an exponential number of possible task groupings, which can make it difficult to choose the best one, and some groupings might produce performance degradation due to negative interference between tasks. Furthermore, existing solutions are severely suffering from scalability issues, limiting any practical application. In our paper, we propose a new data-driven method that addresses these challenges and provides a scalable and modular solution for classification task grouping based on hand-crafted features, specifically Data Maps, which capture the training behavior for each classification task during the MTL training. We experiment with the method demonstrating its effectiveness, even on an unprecedented number of tasks (up to 100).
    Undecimated Wavelet Transform for Word Embedded Semantic Marginal Autoencoder in Security improvement and Denoising different Languages. (arXiv:2307.03679v1 [cs.CL])
    By combining the undecimated wavelet transform within a Word Embedded Semantic Marginal Autoencoder (WESMA), this research study provides a novel strategy for improving security measures and denoising multiple languages. The incorporation of these strategies is intended to address the issues of robustness, privacy, and multilingualism in data processing applications. The undecimated wavelet transform is used as a feature extraction tool to identify prominent language patterns and structural qualities in the input data. The proposed system may successfully capture significant information while preserving the temporal and geographical links within the data by employing this transform. This improves security measures by increasing the system's ability to detect abnormalities, discover hidden patterns, and distinguish between legitimate content and dangerous threats. The Word Embedded Semantic Marginal Autoencoder also functions as an intelligent framework for dimensionality and noise reduction. The autoencoder effectively learns the underlying semantics of the data and reduces noise components by exploiting word embeddings and semantic context. As a result, data quality and accuracy are increased in following processing stages. The suggested methodology is tested using a diversified dataset that includes several languages and security scenarios. The experimental results show that the proposed approach is effective in attaining security enhancement and denoising capabilities across multiple languages. The system is strong in dealing with linguistic variances, producing consistent outcomes regardless of the language used. Furthermore, incorporating the undecimated wavelet transform considerably improves the system's ability to efficiently address complex security concerns
    Evaluating the Effectiveness of Large Language Models in Representing Textual Descriptions of Geometry and Spatial Relations. (arXiv:2307.03678v1 [cs.CL])
    This research focuses on assessing the ability of large language models (LLMs) in representing geometries and their spatial relations. We utilize LLMs including GPT-2 and BERT to encode the well-known text (WKT) format of geometries and then feed their embeddings into classifiers and regressors to evaluate the effectiveness of the LLMs-generated embeddings for geometric attributes. The experiments demonstrate that while the LLMs-generated embeddings can preserve geometry types and capture some spatial relations (up to 73% accuracy), challenges remain in estimating numeric values and retrieving spatially related objects. This research highlights the need for improvement in terms of capturing the nuances and complexities of the underlying geospatial data and integrating domain knowledge to support various GeoAI applications using foundation models.
    SAR: Generalization of Physiological Agility and Dexterity via Synergistic Action Representation. (arXiv:2307.03716v1 [cs.RO])
    Learning effective continuous control policies in high-dimensional systems, including musculoskeletal agents, remains a significant challenge. Over the course of biological evolution, organisms have developed robust mechanisms for overcoming this complexity to learn highly sophisticated strategies for motor control. What accounts for this robust behavioral flexibility? Modular control via muscle synergies, i.e. coordinated muscle co-contractions, is considered to be one putative mechanism that enables organisms to learn muscle control in a simplified and generalizable action space. Drawing inspiration from this evolved motor control strategy, we use physiologically accurate human hand and leg models as a testbed for determining the extent to which a Synergistic Action Representation (SAR) acquired from simpler tasks facilitates learning more complex tasks. We find in both cases that SAR-exploiting policies significantly outperform end-to-end reinforcement learning. Policies trained with SAR were able to achieve robust locomotion on a wide set of terrains with high sample efficiency, while baseline approaches failed to learn meaningful behaviors. Additionally, policies trained with SAR on a multiobject manipulation task significantly outperformed (>70% success) baseline approaches (<20% success). Both of these SAR-exploiting policies were also found to generalize zero-shot to out-of-domain environmental conditions, while policies that did not adopt SAR failed to generalize. Finally, we establish the generality of SAR on broader high-dimensional control problems using a robotic manipulation task set and a full-body humanoid locomotion task. To the best of our knowledge, this investigation is the first of its kind to present an end-to-end pipeline for discovering synergies and using this representation to learn high-dimensional continuous control across a wide diversity of tasks.
    Scalable Membership Inference Attacks via Quantile Regression. (arXiv:2307.03694v1 [cs.LG])
    Membership inference attacks are designed to determine, using black box access to trained models, whether a particular example was used in training or not. Membership inference can be formalized as a hypothesis testing problem. The most effective existing attacks estimate the distribution of some test statistic (usually the model's confidence on the true label) on points that were (and were not) used in training by training many \emph{shadow models} -- i.e. models of the same architecture as the model being attacked, trained on a random subsample of data. While effective, these attacks are extremely computationally expensive, especially when the model under attack is large. We introduce a new class of attacks based on performing quantile regression on the distribution of confidence scores induced by the model under attack on points that are not used in training. We show that our method is competitive with state-of-the-art shadow model attacks, while requiring substantially less compute because our attack requires training only a single model. Moreover, unlike shadow model attacks, our proposed attack does not require any knowledge of the architecture of the model under attack and is therefore truly ``black-box". We show the efficacy of this approach in an extensive series of experiments on various datasets and model architectures.
    INT-FP-QSim: Mixed Precision and Formats For Large Language Models and Vision Transformers. (arXiv:2307.03712v1 [cs.LG])
    The recent rise of large language models (LLMs) has resulted in increased efforts towards running LLMs at reduced precision. Running LLMs at lower precision supports resource constraints and furthers their democratization, enabling users to run billion-parameter LLMs on their personal devices. To supplement this ongoing effort, we propose INT-FP-QSim: an open-source simulator that enables flexible evaluation of LLMs and vision transformers at various numerical precisions and formats. INT-FP-QSim leverages existing open-source repositories such as TensorRT, QPytorch and AIMET for a combined simulator that supports various floating point and integer formats. With the help of our simulator, we survey the impact of different numerical formats on the performance of LLMs and vision transformers at 4-bit weights and 4-bit or 8-bit activations. We also compare recently proposed methods like Adaptive Block Floating Point, SmoothQuant, GPTQ and RPTQ on the model performances. We hope INT-FP-QSim will enable researchers to flexibly simulate models at various precisions to support further research in quantization of LLMs and vision transformers.
    Learning Theory of Distribution Regression with Neural Networks. (arXiv:2307.03487v1 [stat.ML])
    In this paper, we aim at establishing an approximation theory and a learning theory of distribution regression via a fully connected neural network (FNN). In contrast to the classical regression methods, the input variables of distribution regression are probability measures. Then we often need to perform a second-stage sampling process to approximate the actual information of the distribution. On the other hand, the classical neural network structure requires the input variable to be a vector. When the input samples are probability distributions, the traditional deep neural network method cannot be directly used and the difficulty arises for distribution regression. A well-defined neural network structure for distribution inputs is intensively desirable. There is no mathematical model and theoretical analysis on neural network realization of distribution regression. To overcome technical difficulties and address this issue, we establish a novel fully connected neural network framework to realize an approximation theory of functionals defined on the space of Borel probability measures. Furthermore, based on the established functional approximation results, in the hypothesis space induced by the novel FNN structure with distribution inputs, almost optimal learning rates for the proposed distribution regression model up to logarithmic terms are derived via a novel two-stage error decomposition technique.  ( 2 min )
    Accelerated Optimization Landscape of Linear-Quadratic Regulator. (arXiv:2307.03590v1 [math.OC])
    Linear-quadratic regulator (LQR) is a landmark problem in the field of optimal control, which is the concern of this paper. Generally, LQR is classified into state-feedback LQR (SLQR) and output-feedback LQR (OLQR) based on whether the full state is obtained. It has been suggested in existing literature that both the SLQR and the OLQR could be viewed as \textit{constrained nonconvex matrix optimization} problems in which the only variable to be optimized is the feedback gain matrix. In this paper, we introduce a first-order accelerated optimization framework of handling the LQR problem, and give its convergence analysis for the cases of SLQR and OLQR, respectively. Specifically, a Lipschiz Hessian property of LQR performance criterion is presented, which turns out to be a crucial property for the application of modern optimization techniques. For the SLQR problem, a continuous-time hybrid dynamic system is introduced, whose solution trajectory is shown to converge exponentially to the optimal feedback gain with Nesterov-optimal order $1-\frac{1}{\sqrt{\kappa}}$ ($\kappa$ the condition number). Then, the symplectic Euler scheme is utilized to discretize the hybrid dynamic system, and a Nesterov-type method with a restarting rule is proposed that preserves the continuous-time convergence rate, i.e., the discretized algorithm admits the Nesterov-optimal convergence order. For the OLQR problem, a Hessian-free accelerated framework is proposed, which is a two-procedure method consisting of semiconvex function optimization and negative curvature exploitation. In a time $\mathcal{O}(\epsilon^{-7/4}\log(1/\epsilon))$, the method can find an $\epsilon$-stationary point of the performance criterion; this entails that the method improves upon the $\mathcal{O}(\epsilon^{-2})$ complexity of vanilla gradient descent. Moreover, our method provides the second-order guarantee of stationary point.
    PAC bounds of continuous Linear Parameter-Varying systems related to neural ODEs. (arXiv:2307.03630v1 [cs.LG])
    We consider the problem of learning Neural Ordinary Differential Equations (neural ODEs) within the context of Linear Parameter-Varying (LPV) systems in continuous-time. LPV systems contain bilinear systems which are known to be universal approximators for non-linear systems. Moreover, a large class of neural ODEs can be embedded into LPV systems. As our main contribution we provide Probably Approximately Correct (PAC) bounds under stability for LPV systems related to neural ODEs. The resulting bounds have the advantage that they do not depend on the integration interval.
    Online Network Source Optimization with Graph-Kernel MAB. (arXiv:2307.03641v1 [cs.LG])
    We propose Grab-UCB, a graph-kernel multi-arms bandit algorithm to learn online the optimal source placement in large scale networks, such that the reward obtained from a priori unknown network processes is maximized. The uncertainty calls for online learning, which suffers however from the curse of dimensionality. To achieve sample efficiency, we describe the network processes with an adaptive graph dictionary model, which typically leads to sparse spectral representations. This enables a data-efficient learning framework, whose learning rate scales with the dimension of the spectral representation model instead of the one of the network. We then propose Grab-UCB, an online sequential decision strategy that learns the parameters of the spectral representation while optimizing the action strategy. We derive the performance guarantees that depend on network parameters, which further influence the learning curve of the sequential decision strategy We introduce a computationally simplified solving method, Grab-arm-Light, an algorithm that walks along the edges of the polytope representing the objective function. Simulations results show that the proposed online learning algorithm outperforms baseline offline methods that typically separate the learning phase from the testing one. The results confirm the theoretical findings, and further highlight the gain of the proposed online learning strategy in terms of cumulative regret, sample efficiency and computational complexity.
    Differentiable Turbulence. (arXiv:2307.03683v1 [physics.flu-dyn])
    Deep learning is increasingly becoming a promising pathway to improving the accuracy of sub-grid scale (SGS) turbulence closure models for large eddy simulations (LES). We leverage the concept of differentiable turbulence, whereby an end-to-end differentiable solver is used in combination with physics-inspired choices of deep learning architectures to learn highly effective and versatile SGS models for two-dimensional turbulent flow. We perform an in-depth analysis of the inductive biases in the chosen architectures, finding that the inclusion of small-scale non-local features is most critical to effective SGS modeling, while large-scale features can improve pointwise accuracy of the a-posteriori solution field. The filtered velocity gradient tensor can be mapped directly to the SGS stress via decomposition of the inputs and outputs into isotropic, deviatoric, and anti-symmetric components. We see that the model can generalize to a variety of flow configurations, including higher and lower Reynolds numbers and different forcing conditions. We show that the differentiable physics paradigm is more successful than offline, a-priori learning, and that hybrid solver-in-the-loop approaches to deep learning offer an ideal balance between computational efficiency, accuracy, and generalization. Our experiments provide physics-based recommendations for deep-learning based SGS modeling for generalizable closure modeling of turbulence.
    Incentive-Theoretic Bayesian Inference for Collaborative Science. (arXiv:2307.03748v1 [stat.ME])
    Contemporary scientific research is a distributed, collaborative endeavor, carried out by teams of researchers, regulatory institutions, funding agencies, commercial partners, and scientific bodies, all interacting with each other and facing different incentives. To maintain scientific rigor, statistical methods should acknowledge this state of affairs. To this end, we study hypothesis testing when there is an agent (e.g., a researcher or a pharmaceutical company) with a private prior about an unknown parameter and a principal (e.g., a policymaker or regulator) who wishes to make decisions based on the parameter value. The agent chooses whether to run a statistical trial based on their private prior and then the result of the trial is used by the principal to reach a decision. We show how the principal can conduct statistical inference that leverages the information that is revealed by an agent's strategic behavior -- their choice to run a trial or not. In particular, we show how the principal can design a policy to elucidate partial information about the agent's private prior beliefs and use this to control the posterior probability of the null. One implication is a simple guideline for the choice of significance threshold in clinical trials: the type-I error level should be set to be strictly less than the cost of the trial divided by the firm's profit if the trial is successful.
    PseudoCell: Hard Negative Mining as Pseudo Labeling for Deep Learning-Based Centroblast Cell Detection. (arXiv:2307.03211v1 [q-bio.QM])
    Patch classification models based on deep learning have been utilized in whole-slide images (WSI) of H&E-stained tissue samples to assist pathologists in grading follicular lymphoma patients. However, these approaches still require pathologists to manually identify centroblast cells and provide refined labels for optimal performance. To address this, we propose PseudoCell, an object detection framework to automate centroblast detection in WSI (source code is available at https://github.com/IoBT-VISTEC/PseudoCell.git). This framework incorporates centroblast labels from pathologists and combines them with pseudo-negative labels obtained from undersampled false-positive predictions using the cell's morphological features. By employing PseudoCell, pathologists' workload can be reduced as it accurately narrows down the areas requiring their attention during examining tissue. Depending on the confidence threshold, PseudoCell can eliminate 58.18-99.35% of non-centroblasts tissue areas on WSI. This study presents a practical centroblast prescreening method that does not require pathologists' refined labels for improvement. Detailed guidance on the practical implementation of PseudoCell is provided in the discussion section.  ( 2 min )
    Unpaired Multi-View Graph Clustering with Cross-View Structure Matching. (arXiv:2307.03476v1 [cs.LG])
    Multi-view clustering (MVC), which effectively fuses information from multiple views for better performance, has received increasing attention. Most existing MVC methods assume that multi-view data are fully paired, which means that the mappings of all corresponding samples between views are pre-defined or given in advance. However, the data correspondence is often incomplete in real-world applications due to data corruption or sensor differences, referred as the data-unpaired problem (DUP) in multi-view literature. Although several attempts have been made to address the DUP issue, they suffer from the following drawbacks: 1) Most methods focus on the feature representation while ignoring the structural information of multi-view data, which is essential for clustering tasks; 2) Existing methods for partially unpaired problems rely on pre-given cross-view alignment information, resulting in their inability to handle fully unpaired problems; 3) Their inevitable parameters degrade the efficiency and applicability of the models. To tackle these issues, we propose a novel parameter-free graph clustering framework termed Unpaired Multi-view Graph Clustering framework with Cross-View Structure Matching (UPMGC-SM). Specifically, unlike the existing methods, UPMGC-SM effectively utilizes the structural information from each view to refine cross-view correspondences. Besides, our UPMGC-SM is a unified framework for both the fully and partially unpaired multi-view graph clustering. Moreover, existing graph clustering methods can adopt our UPMGC-SM to enhance their ability for unpaired scenarios. Extensive experiments demonstrate the effectiveness and generalization of our proposed framework for both paired and unpaired datasets.
    Programmable Synthetic Tabular Data Generation. (arXiv:2307.03577v1 [cs.LG])
    Large amounts of tabular data remain underutilized due to privacy, data quality, and data sharing limitations. While training a generative model producing synthetic data resembling the original distribution addresses some of these issues, most applications require additional constraints from the generated data. Existing synthetic data approaches are limited as they typically only handle specific constraints, e.g., differential privacy (DP) or increased fairness, and lack an accessible interface for declaring general specifications. In this work, we introduce ProgSyn, the first programmable synthetic tabular data generation algorithm that allows for comprehensive customization over the generated data. To ensure high data quality while adhering to custom specifications, ProgSyn pre-trains a generative model on the original dataset and fine-tunes it on a differentiable loss automatically derived from the provided specifications. These can be programmatically declared using statistical and logical expressions, supporting a wide range of requirements (e.g., DP or fairness, among others). We conduct an extensive experimental evaluation of ProgSyn on a number of constraints, achieving a new state-of-the-art on some, while remaining general. For instance, at the same fairness level we achieve 2.3% higher downstream accuracy than the state-of-the-art in fair synthetic data generation on the Adult dataset. Overall, ProgSyn provides a versatile and accessible framework for generating constrained synthetic tabular data, allowing for specifications that generalize beyond the capabilities of prior work.  ( 2 min )
    Encoder-Decoder Networks for Self-Supervised Pretraining and Downstream Signal Bandwidth Regression on Digital Antenna Arrays. (arXiv:2307.03327v1 [cs.LG])
    This work presents the first applications of self-supervised learning applied to data from digital antenna arrays. Encoder-decoder networks are pretrained on digital array data to perform a self-supervised noisy-reconstruction task called channel in-painting, in which the network infers the contents of array data that has been masked with zeros. The self-supervised step requires no human-labeled data. The encoder architecture and weights from pretraining are then transferred to a new network with a task-specific decoder, and the new network is trained on a small volume of labeled data. We show that pretraining on the unlabeled data allows the new network to perform the task of bandwidth regression on the digital array data better than an equivalent network that is trained on the same labeled data from random initialization.
    GeoPhy: Differentiable Phylogenetic Inference via Geometric Gradients of Tree Topologies. (arXiv:2307.03675v1 [cs.LG])
    Phylogenetic inference, grounded in molecular evolution models, is essential for understanding the evolutionary relationships in biological data. Accounting for the uncertainty of phylogenetic tree variables, which include tree topologies and evolutionary distances on branches, is crucial for accurately inferring species relationships from molecular data and tasks requiring variable marginalization. Variational Bayesian methods are key to developing scalable, practical models; however, it remains challenging to conduct phylogenetic inference without restricting the combinatorially vast number of possible tree topologies. In this work, we introduce a novel, fully differentiable formulation of phylogenetic inference that leverages a unique representation of topological distributions in continuous geometric spaces. Through practical considerations on design spaces and control variates for gradient estimations, our approach, GeoPhy, enables variational inference without limiting the topological candidates. In experiments using real benchmark datasets, GeoPhy significantly outperformed other approximate Bayesian methods that considered whole topologies.
    Backward Feature Correction: How Deep Learning Performs Deep (Hierarchical) Learning. (arXiv:2001.04413v6 [cs.LG] UPDATED)
    Deep learning is also known as hierarchical learning, where the learner _learns_ to represent a complicated target function by decomposing it into a sequence of simpler functions to reduce sample and time complexity. This paper formally analyzes how multi-layer neural networks can perform such hierarchical learning _efficiently_ and _automatically_ by SGD on the training objective. On the conceptual side, we present a theoretical characterizations of how certain types of deep (i.e. super-constant layer) neural networks can still be sample and time efficiently trained on some hierarchical tasks, when no existing algorithm (including layerwise training, kernel method, etc) is known to be efficient. We establish a new principle called "backward feature correction", where the errors in the lower-level features can be automatically corrected when training together with the higher-level layers. We believe this is a key behind how deep learning is performing deep (hierarchical) learning, as opposed to layerwise learning or simulating some non-hierarchical method. On the technical side, we show for every input dimension $d > 0$, there is a concept class of degree $\omega(1)$ multi-variate polynomials so that, using $\omega(1)$-layer neural networks as learners, SGD can learn any function from this class in $\mathsf{poly}(d)$ time to any $\frac{1}{\mathsf{poly}(d)}$ error, through learning to represent it as a composition of $\omega(1)$ layers of quadratic functions using "backward feature correction." In contrast, we do not know any other simpler algorithm (including layerwise training, applying kernel method sequentially, training a two-layer network, etc) that can learn this concept class in $\mathsf{poly}(d)$ time even to any $d^{-0.01}$ error. As a side result, we prove $d^{\omega(1)}$ lower bounds for several non-hierarchical learners, including any kernel methods.
    Equivariant Spherical CNN for Data Efficient and High-Performance Medical Image Processing. (arXiv:2307.03298v1 [eess.IV])
    This work highlights the significance of equivariant networks as efficient and high-performance approaches for tomography applications. Our study builds upon the limitations of Convolutional Neural Networks (CNNs), which have shown promise in post-processing various medical imaging systems. However, the efficiency of conventional CNNs heavily relies on an undiminished and proper training set. To tackle this issue, in this study, we introduce an equivariant network, aiming to reduce CNN's dependency on specific training sets. We evaluate the efficacy of equivariant CNNs on spherical signals for tomographic medical imaging problems. Our results demonstrate superior quality and computational efficiency of spherical CNNs (SCNNs) in denoising and reconstructing benchmark problems. Furthermore, we propose a novel approach to employ SCNNs as a complement to conventional image reconstruction tools, enhancing the outcomes while reducing reliance on the training set. Across all cases, we observe a significant decrease in computational costs while maintaining the same or higher quality of image processing using SCNNs compared to CNNs. Additionally, we explore the potential of this network for broader tomography applications, particularly those requiring omnidirectional representation.  ( 2 min )
    Variational quantum regression algorithm with encoded data structure. (arXiv:2307.03334v1 [quant-ph])
    Variational quantum algorithms (VQAs) prevail to solve practical problems such as combinatorial optimization, quantum chemistry simulation, quantum machine learning, and quantum error correction on noisy quantum computers. For variational quantum machine learning, a variational algorithm with model interpretability built into the algorithm is yet to be exploited. In this paper, we construct a quantum regression algorithm and identify the direct relation of variational parameters to learned regression coefficients, while employing a circuit that directly encodes the data in quantum amplitudes reflecting the structure of the classical data table. The algorithm is particularly suitable for well-connected qubits. With compressed encoding and digital-analog gate operation, the run time complexity is logarithmically more advantageous than that for digital 2-local gate native hardware with the number of data entries encoded, a decent improvement in noisy intermediate-scale quantum computers and a minor improvement for large-scale quantum computing Our suggested method of compressed binary encoding offers a remarkable reduction in the number of physical qubits needed when compared to the traditional one-hot-encoding technique with the same input data. The algorithm inherently performs linear regression but can also be used easily for nonlinear regression by building nonlinear features into the training data. In terms of measured cost function which distinguishes a good model from a poor one for model training, it will be effective only when the number of features is much less than the number of records for the encoded data structure to be observable. To echo this finding and mitigate hardware noise in practice, the ensemble model training from the quantum regression model learning with important feature selection from regularization is incorporated and illustrated numerically.
    Simulation-free Schr\"odinger bridges via score and flow matching. (arXiv:2307.03672v1 [cs.LG])
    We present simulation-free score and flow matching ([SF]$^2$M), a simulation-free objective for inferring stochastic dynamics given unpaired source and target samples drawn from arbitrary distributions. Our method generalizes both the score-matching loss used in the training of diffusion models and the recently proposed flow matching loss used in the training of continuous normalizing flows. [SF]$^2$M interprets continuous-time stochastic generative modeling as a Schr\"odinger bridge (SB) problem. It relies on static entropy-regularized optimal transport, or a minibatch approximation, to efficiently learn the SB without simulating the learned stochastic process. We find that [SF]$^2$M is more efficient and gives more accurate solutions to the SB problem than simulation-based methods from prior work. Finally, we apply [SF]$^2$M to the problem of learning cell dynamics from snapshot data. Notably, [SF]$^2$M is the first method to accurately model cell dynamics in high dimensions and can recover known gene regulatory networks from simulated data.
    F2A2: Flexible Fully-decentralized Approximate Actor-critic for Cooperative Multi-agent Reinforcement Learning. (arXiv:2004.11145v2 [cs.LG] UPDATED)
    Traditional centralized multi-agent reinforcement learning (MARL) algorithms are sometimes unpractical in complicated applications, due to non-interactivity between agents, curse of dimensionality and computation complexity. Hence, several decentralized MARL algorithms are motivated. However, existing decentralized methods only handle the fully cooperative setting where massive information needs to be transmitted in training. The block coordinate gradient descent scheme they used for successive independent actor and critic steps can simplify the calculation, but it causes serious bias. In this paper, we propose a flexible fully decentralized actor-critic MARL framework, which can combine most of actor-critic methods, and handle large-scale general cooperative multi-agent setting. A primal-dual hybrid gradient descent type algorithm framework is designed to learn individual agents separately for decentralization. From the perspective of each agent, policy improvement and value evaluation are jointly optimized, which can stabilize multi-agent policy learning. Furthermore, our framework can achieve scalability and stability for large-scale environment and reduce information transmission, by the parameter sharing mechanism and a novel modeling-other-agents methods based on theory-of-mind and online supervised learning. Sufficient experiments in cooperative Multi-agent Particle Environment and StarCraft II show that our decentralized MARL instantiation algorithms perform competitively against conventional centralized and decentralized methods.
    QIGen: Generating Efficient Kernels for Quantized Inference on Large Language Models. (arXiv:2307.03738v1 [cs.LG])
    We present ongoing work on a new automatic code generation approach for supporting quantized generative inference on LLMs such as LLaMA or OPT on off-the-shelf CPUs. Our approach is informed by the target architecture and a performance model, including both hardware characteristics and method-specific accuracy constraints. Results on CPU-based inference for LLaMA models show that our approach can lead to high performance and high accuracy, comparing favorably to the best existing open-source solution. A preliminary implementation is available at https://github.com/IST-DASLab/QIGen.
    Hyperspectral and Multispectral Image Fusion Using the Conditional Denoising Diffusion Probabilistic Model. (arXiv:2307.03423v1 [eess.IV])
    Hyperspectral images (HSI) have a large amount of spectral information reflecting the characteristics of matter, while their spatial resolution is low due to the limitations of imaging technology. Complementary to this are multispectral images (MSI), e.g., RGB images, with high spatial resolution but insufficient spectral bands. Hyperspectral and multispectral image fusion is a technique for acquiring ideal images that have both high spatial and high spectral resolution cost-effectively. Many existing HSI and MSI fusion algorithms rely on known imaging degradation models, which are often not available in practice. In this paper, we propose a deep fusion method based on the conditional denoising diffusion probabilistic model, called DDPM-Fus. Specifically, the DDPM-Fus contains the forward diffusion process which gradually adds Gaussian noise to the high spatial resolution HSI (HrHSI) and another reverse denoising process which learns to predict the desired HrHSI from its noisy version conditioning on the corresponding high spatial resolution MSI (HrMSI) and low spatial resolution HSI (LrHSI). Once the training is completes, the proposed DDPM-Fus implements the reverse process on the test HrMSI and LrHSI to generate the fused HrHSI. Experiments conducted on one indoor and two remote sensing datasets show the superiority of the proposed model when compared with other advanced deep learningbased fusion methods. The codes of this work will be opensourced at this address: https://github.com/shuaikaishi/DDPMFus for reproducibility.  ( 3 min )
    Polybot: Training One Policy Across Robots While Embracing Variability. (arXiv:2307.03719v1 [cs.RO])
    Reusing large datasets is crucial to scale vision-based robotic manipulators to everyday scenarios due to the high cost of collecting robotic datasets. However, robotic platforms possess varying control schemes, camera viewpoints, kinematic configurations, and end-effector morphologies, posing significant challenges when transferring manipulation skills from one platform to another. To tackle this problem, we propose a set of key design decisions to train a single policy for deployment on multiple robotic platforms. Our framework first aligns the observation and action spaces of our policy across embodiments via utilizing wrist cameras and a unified, but modular codebase. To bridge the remaining domain shift, we align our policy's internal representations across embodiments through contrastive learning. We evaluate our method on a dataset collected over 60 hours spanning 6 tasks and 3 robots with varying joint configurations and sizes: the WidowX 250S, the Franka Emika Panda, and the Sawyer. Our results demonstrate significant improvements in success rate and sample efficiency for our policy when using new task data collected on a different robot, validating our proposed design decisions. More details and videos can be found on our anonymized project website: https://sites.google.com/view/polybot-multirobot
    MALIBO: Meta-learning for Likelihood-free Bayesian Optimization. (arXiv:2307.03565v1 [cs.LG])
    Bayesian optimization (BO) is a popular method to optimize costly black-box functions. While traditional BO optimizes each new target task from scratch, meta-learning has emerged as a way to leverage knowledge from related tasks to optimize new tasks faster. However, existing meta-learning BO methods rely on surrogate models that suffer from scalability issues and are sensitive to observations with different scales and noise types across tasks. Moreover, they often overlook the uncertainty associated with task similarity. This leads to unreliable task adaptation when only limited observations are obtained or when the new tasks differ significantly from the related tasks. To address these limitations, we propose a novel meta-learning BO approach that bypasses the surrogate model and directly learns the utility of queries across tasks. Our method explicitly models task uncertainty and includes an auxiliary model to enable robust adaptation to new tasks. Extensive experiments show that our method demonstrates strong anytime performance and outperforms state-of-the-art meta-learning BO methods in various benchmarks.
    Smoothing the Edges: A General Framework for Smooth Optimization in Sparse Regularization using Hadamard Overparametrization. (arXiv:2307.03571v1 [cs.LG])
    This paper introduces a smooth method for (structured) sparsity in $\ell_q$ and $\ell_{p,q}$ regularized optimization problems. Optimization of these non-smooth and possibly non-convex problems typically relies on specialized procedures. In contrast, our general framework is compatible with prevalent first-order optimization methods like Stochastic Gradient Descent and accelerated variants without any required modifications. This is accomplished through a smooth optimization transfer, comprising an overparametrization of selected model parameters using Hadamard products and a change of penalties. In the overparametrized problem, smooth and convex $\ell_2$ regularization of the surrogate parameters induces non-smooth and non-convex $\ell_q$ or $\ell_{p,q}$ regularization in the original parametrization. We show that our approach yields not only matching global minima but also equivalent local minima. This is particularly useful in non-convex sparse regularization, where finding global minima is NP-hard and local minima are known to generalize well. We provide a comprehensive overview consolidating various literature strands on sparsity-inducing parametrizations and propose meaningful extensions to existing approaches. The feasibility of our approach is evaluated through numerical experiments, which demonstrate that its performance is on par with or surpasses commonly used implementations of convex and non-convex regularization methods.
    Guideline for Trustworthy Artificial Intelligence -- AI Assessment Catalog. (arXiv:2307.03681v1 [cs.CY])
    Artificial Intelligence (AI) has made impressive progress in recent years and represents a key technology that has a crucial impact on the economy and society. However, it is clear that AI and business models based on it can only reach their full potential if AI applications are developed according to high quality standards and are effectively protected against new AI risks. For instance, AI bears the risk of unfair treatment of individuals when processing personal data e.g., to support credit lending or staff recruitment decisions. The emergence of these new risks is closely linked to the fact that the behavior of AI applications, particularly those based on Machine Learning (ML), is essentially learned from large volumes of data and is not predetermined by fixed programmed rules. Thus, the issue of the trustworthiness of AI applications is crucial and is the subject of numerous major publications by stakeholders in politics, business and society. In addition, there is mutual agreement that the requirements for trustworthy AI, which are often described in an abstract way, must now be made clear and tangible. One challenge to overcome here relates to the fact that the specific quality criteria for an AI application depend heavily on the application context and possible measures to fulfill them in turn depend heavily on the AI technology used. Lastly, practical assessment procedures are needed to evaluate whether specific AI applications have been developed according to adequate quality standards. This AI assessment catalog addresses exactly this point and is intended for two target groups: Firstly, it provides developers with a guideline for systematically making their AI applications trustworthy. Secondly, it guides assessors and auditors on how to examine AI applications for trustworthiness in a structured way.  ( 3 min )
    Stability and Generalization of Stochastic Compositional Gradient Descent Algorithms. (arXiv:2307.03357v1 [cs.LG])
    Many machine learning tasks can be formulated as a stochastic compositional optimization (SCO) problem such as reinforcement learning, AUC maximization, and meta-learning, where the objective function involves a nested composition associated with an expectation. While a significant amount of studies has been devoted to studying the convergence behavior of SCO algorithms, there is little work on understanding their generalization, i.e., how these learning algorithms built from training examples would behave on future test examples. In this paper, we provide the stability and generalization analysis of stochastic compositional gradient descent algorithms through the lens of algorithmic stability in the framework of statistical learning theory. Firstly, we introduce a stability concept called compositional uniform stability and establish its quantitative relation with generalization for SCO problems. Then, we establish the compositional uniform stability results for two popular stochastic compositional gradient descent algorithms, namely SCGD and SCSC. Finally, we derive dimension-independent excess risk bounds for SCGD and SCSC by trade-offing their stability results and optimization errors. To the best of our knowledge, these are the first-ever-known results on stability and generalization analysis of stochastic compositional gradient descent algorithms.  ( 2 min )
    Mitigating Negative Transfer with Task Awareness for Sexism, Hate Speech, and Toxic Language Detection. (arXiv:2307.03377v1 [cs.CL])
    This paper proposes a novelty approach to mitigate the negative transfer problem. In the field of machine learning, the common strategy is to apply the Single-Task Learning approach in order to train a supervised model to solve a specific task. Training a robust model requires a lot of data and a significant amount of computational resources, making this solution unfeasible in cases where data are unavailable or expensive to gather. Therefore another solution, based on the sharing of information between tasks, has been developed: Multi-Task Learning (MTL). Despite the recent developments regarding MTL, the problem of negative transfer has still to be solved. Negative transfer is a phenomenon that occurs when noisy information is shared between tasks, resulting in a drop in performance. This paper proposes a new approach to mitigate the negative transfer problem based on the task awareness concept. The proposed approach results in diminishing the negative transfer together with an improvement of performance over classic MTL solution. Moreover, the proposed approach has been implemented in two unified architectures to detect Sexism, Hate Speech, and Toxic Language in text comments. The proposed architectures set a new state-of-the-art both in EXIST-2021 and HatEval-2019 benchmarks.  ( 2 min )
    Exploring the Potential of Large Language Models (LLMs) in Learning on Graphs. (arXiv:2307.03393v1 [cs.LG])
    Learning on Graphs has attracted immense attention due to its wide real-world applications. The most popular pipeline for learning on graphs with textual node attributes primarily relies on Graph Neural Networks (GNNs), and utilizes shallow text embedding as initial node representations, which has limitations in general knowledge and profound semantic understanding. In recent years, Large Language Models (LLMs) have been proven to possess extensive common knowledge and powerful semantic comprehension abilities that have revolutionized existing workflows to handle text data. In this paper, we aim to explore the potential of LLMs in graph machine learning, especially the node classification task, and investigate two possible pipelines: LLMs-as-Enhancers and LLMs-as-Predictors. The former leverages LLMs to enhance nodes' text attributes with their massive knowledge and then generate predictions through GNNs. The latter attempts to directly employ LLMs as standalone predictors. We conduct comprehensive and systematical studies on these two pipelines under various settings. From comprehensive empirical results, we make original observations and find new insights that open new possibilities and suggest promising directions to leverage LLMs for learning on graphs.  ( 2 min )
    On Formal Feature Attribution and Its Approximation. (arXiv:2307.03380v1 [cs.AI])
    Recent years have witnessed the widespread use of artificial intelligence (AI) algorithms and machine learning (ML) models. Despite their tremendous success, a number of vital problems like ML model brittleness, their fairness, and the lack of interpretability warrant the need for the active developments in explainable artificial intelligence (XAI) and formal ML model verification. The two major lines of work in XAI include feature selection methods, e.g. Anchors, and feature attribution techniques, e.g. LIME and SHAP. Despite their promise, most of the existing feature selection and attribution approaches are susceptible to a range of critical issues, including explanation unsoundness and out-of-distribution sampling. A recent formal approach to XAI (FXAI) although serving as an alternative to the above and free of these issues suffers from a few other limitations. For instance and besides the scalability limitation, the formal approach is unable to tackle the feature attribution problem. Additionally, a formal explanation despite being formally sound is typically quite large, which hampers its applicability in practical settings. Motivated by the above, this paper proposes a way to apply the apparatus of formal XAI to the case of feature attribution based on formal explanation enumeration. Formal feature attribution (FFA) is argued to be advantageous over the existing methods, both formal and non-formal. Given the practical complexity of the problem, the paper then proposes an efficient technique for approximating exact FFA. Finally, it offers experimental evidence of the effectiveness of the proposed approximate FFA in comparison to the existing feature attribution algorithms not only in terms of feature importance and but also in terms of their relative order.  ( 3 min )
    Teaching Arithmetic to Small Transformers. (arXiv:2307.03381v1 [cs.LG])
    Large language models like GPT-4 exhibit emergent capabilities across general-purpose tasks, such as basic arithmetic, when trained on extensive text data, even though these tasks are not explicitly encoded by the unsupervised, next-token prediction objective. This study investigates how small transformers, trained from random initialization, can efficiently learn arithmetic operations such as addition, multiplication, and elementary functions like square root, using the next-token prediction objective. We first demonstrate that conventional training data is not the most effective for arithmetic learning, and simple formatting changes can significantly improve accuracy. This leads to sharp phase transitions as a function of training data scale, which, in some cases, can be explained through connections to low-rank matrix completion. Building on prior work, we then train on chain-of-thought style data that includes intermediate step results. Even in the complete absence of pretraining, this approach significantly and simultaneously improves accuracy, sample complexity, and convergence speed. We also study the interplay between arithmetic and text data during training and examine the effects of few-shot prompting, pretraining, and model scale. Additionally, we discuss length generalization challenges. Our work highlights the importance of high-quality, instructive data that considers the particular characteristics of the next-word prediction objective for rapidly eliciting arithmetic capabilities.  ( 2 min )
    Incentive Allocation in Vertical Federated Learning Based on Bankruptcy Problem. (arXiv:2307.03515v1 [cs.LG])
    Vertical federated learning (VFL) is a promising approach for collaboratively training machine learning models using private data partitioned vertically across different parties. Ideally in a VFL setting, the active party (party possessing features of samples with labels) benefits by improving its machine learning model through collaboration with some passive parties (parties possessing additional features of the same samples without labels) in a privacy preserving manner. However, motivating passive parties to participate in VFL can be challenging. In this paper, we focus on the problem of allocating incentives to the passive parties by the active party based on their contributions to the VFL process. We formulate this problem as a variant of the Nucleolus game theory concept, known as the Bankruptcy Problem, and solve it using the Talmud's division rule. We evaluate our proposed method on synthetic and real-world datasets and show that it ensures fairness and stability in incentive allocation among passive parties who contribute their data to the federated model. Additionally, we compare our method to the existing solution of calculating Shapley values and show that our approach provides a more efficient solution with fewer computations.  ( 2 min )
    Optimal Scalarizations for Sublinear Hypervolume Regret. (arXiv:2307.03288v1 [cs.LG])
    Scalarization is a general technique that can be deployed in any multiobjective setting to reduce multiple objectives into one, such as recently in RLHF for training reward models that align human preferences. Yet some have dismissed this classical approach because linear scalarizations are known to miss concave regions of the Pareto frontier. To that end, we aim to find simple non-linear scalarizations that can explore a diverse set of $k$ objectives on the Pareto frontier, as measured by the dominated hypervolume. We show that hypervolume scalarizations with uniformly random weights are surprisingly optimal for provably minimizing the hypervolume regret, achieving an optimal sublinear regret bound of $O(T^{-1/k})$, with matching lower bounds that preclude any algorithm from doing better asymptotically. As a theoretical case study, we consider the multiobjective stochastic linear bandits problem and demonstrate that by exploiting the sublinear regret bounds of the hypervolume scalarizations, we can derive a novel non-Euclidean analysis that produces improved hypervolume regret bounds of $\tilde{O}( d T^{-1/2} + T^{-1/k})$. We support our theory with strong empirical performance of using simple hypervolume scalarizations that consistently outperforms both the linear and Chebyshev scalarizations, as well as standard multiobjective algorithms in bayesian optimization, such as EHVI.  ( 2 min )
    Suppressing unknown disturbances to dynamical systems using machine learning. (arXiv:2307.03690v1 [eess.SY])
    Identifying and suppressing unknown disturbances to dynamical systems is a problem with applications in many different fields. In this Letter, we present a model-free method to identify and suppress an unknown disturbance to an unknown system based only on previous observations of the system under the influence of a known forcing function. We find that, under very mild restrictions on the training function, our method is able to robustly identify and suppress a large class of unknown disturbances. We illustrate our scheme with an example where a chaotic disturbance to the Lorenz system is identified and suppressed.  ( 2 min )
    HoughLaneNet: Lane Detection with Deep Hough Transform and Dynamic Convolution. (arXiv:2307.03494v1 [cs.CV])
    The task of lane detection has garnered considerable attention in the field of autonomous driving due to its complexity. Lanes can present difficulties for detection, as they can be narrow, fragmented, and often obscured by heavy traffic. However, it has been observed that the lanes have a geometrical structure that resembles a straight line, leading to improved lane detection results when utilizing this characteristic. To address this challenge, we propose a hierarchical Deep Hough Transform (DHT) approach that combines all lane features in an image into the Hough parameter space. Additionally, we refine the point selection method and incorporate a Dynamic Convolution Module to effectively differentiate between lanes in the original image. Our network architecture comprises a backbone network, either a ResNet or Pyramid Vision Transformer, a Feature Pyramid Network as the neck to extract multi-scale features, and a hierarchical DHT-based feature aggregation head to accurately segment each lane. By utilizing the lane features in the Hough parameter space, the network learns dynamic convolution kernel parameters corresponding to each lane, allowing the Dynamic Convolution Module to effectively differentiate between lane features. Subsequently, the lane features are fed into the feature decoder, which predicts the final position of the lane. Our proposed network structure demonstrates improved performance in detecting heavily occluded or worn lane images, as evidenced by our extensive experimental results, which show that our method outperforms or is on par with state-of-the-art techniques.  ( 3 min )
    A Vulnerability of Attribution Methods Using Pre-Softmax Scores. (arXiv:2307.03305v1 [cs.LG])
    We discuss a vulnerability involving a category of attribution methods used to provide explanations for the outputs of convolutional neural networks working as classifiers. It is known that this type of networks are vulnerable to adversarial attacks, in which imperceptible perturbations of the input may alter the outputs of the model. In contrast, here we focus on effects that small modifications in the model may cause on the attribution method without altering the model outputs.  ( 2 min )
    CSCLog: A Component Subsequence Correlation-Aware Log Anomaly Detection Method. (arXiv:2307.03359v1 [cs.LG])
    Anomaly detection based on system logs plays an important role in intelligent operations, which is a challenging task due to the extremely complex log patterns. Existing methods detect anomalies by capturing the sequential dependencies in log sequences, which ignore the interactions of subsequences. To this end, we propose CSCLog, a Component Subsequence Correlation-Aware Log anomaly detection method, which not only captures the sequential dependencies in subsequences, but also models the implicit correlations of subsequences. Specifically, subsequences are extracted from log sequences based on components and the sequential dependencies in subsequences are captured by Long Short-Term Memory Networks (LSTMs). An implicit correlation encoder is introduced to model the implicit correlations of subsequences adaptively. In addition, Graph Convolution Networks (GCNs) are employed to accomplish the information interactions of subsequences. Finally, attention mechanisms are exploited to fuse the embeddings of all subsequences. Extensive experiments on four publicly available log datasets demonstrate the effectiveness of CSCLog, outperforming the best baseline by an average of 7.41% in Macro F1-Measure.  ( 2 min )
    Optimal Bandwidth Selection for DENCLUE. (arXiv:2307.03206v1 [cs.LG])
    In modern day industry, clustering algorithms are daily routines of algorithm engineers. Although clustering algorithms experienced rapid growth before 2010. Innovation related to the research topic has stagnated after deep learning became the de facto industrial standard for machine learning applications. In 2007, a density-based clustering algorithm named DENCLUE was invented to solve clustering problem for nonlinear data structures. However, its parameter selection problem was largely neglected until 2011. In this paper, we propose a new approach to compute the optimal parameters for the DENCLUE algorithm, and discuss its performance in the experiment section.  ( 2 min )
    On the Evolution of (Hateful) Memes by Means of Multimodal Contrastive Learning. (arXiv:2212.06573v2 [cs.SI] UPDATED)
    The dissemination of hateful memes online has adverse effects on social media platforms and the real world. Detecting hateful memes is challenging, one of the reasons being the evolutionary nature of memes; new hateful memes can emerge by fusing hateful connotations with other cultural ideas or symbols. In this paper, we propose a framework that leverages multimodal contrastive learning models, in particular OpenAI's CLIP, to identify targets of hateful content and systematically investigate the evolution of hateful memes. We find that semantic regularities exist in CLIP-generated embeddings that describe semantic relationships within the same modality (images) or across modalities (images and text). Leveraging this property, we study how hateful memes are created by combining visual elements from multiple images or fusing textual information with a hateful image. We demonstrate the capabilities of our framework for analyzing the evolution of hateful memes by focusing on antisemitic memes, particularly the Happy Merchant meme. Using our framework on a dataset extracted from 4chan, we find 3.3K variants of the Happy Merchant meme, with some linked to specific countries, persons, or organizations. We envision that our framework can be used to aid human moderators by flagging new variants of hateful memes so that moderators can manually verify them and mitigate the problem of hateful content online.
    AI-UPV at EXIST 2023 -- Sexism Characterization Using Large Language Models Under The Learning with Disagreements Regime. (arXiv:2307.03385v1 [cs.CL])
    With the increasing influence of social media platforms, it has become crucial to develop automated systems capable of detecting instances of sexism and other disrespectful and hateful behaviors to promote a more inclusive and respectful online environment. Nevertheless, these tasks are considerably challenging considering different hate categories and the author's intentions, especially under the learning with disagreements regime. This paper describes AI-UPV team's participation in the EXIST (sEXism Identification in Social neTworks) Lab at CLEF 2023. The proposed approach aims at addressing the task of sexism identification and characterization under the learning with disagreements paradigm by training directly from the data with disagreements, without using any aggregated label. Yet, performances considering both soft and hard evaluations are reported. The proposed system uses large language models (i.e., mBERT and XLM-RoBERTa) and ensemble strategies for sexism identification and classification in English and Spanish. In particular, our system is articulated in three different pipelines. The ensemble approach outperformed the individual large language models obtaining the best performances both adopting a soft and a hard label evaluation. This work describes the participation in all the three EXIST tasks, considering a soft evaluation, it obtained fourth place in Task 2 at EXIST and first place in Task 3, with the highest ICM-Soft of -2.32 and a normalized ICM-Soft of 0.79. The source code of our approaches is publicly available at https://github.com/AngelFelipeMP/Sexism-LLM-Learning-With-Disagreement.
    Evaluating Biased Attitude Associations of Language Models in an Intersectional Context. (arXiv:2307.03360v1 [cs.CY])
    Language models are trained on large-scale corpora that embed implicit biases documented in psychology. Valence associations (pleasantness/unpleasantness) of social groups determine the biased attitudes towards groups and concepts in social cognition. Building on this established literature, we quantify how social groups are valenced in English language models using a sentence template that provides an intersectional context. We study biases related to age, education, gender, height, intelligence, literacy, race, religion, sex, sexual orientation, social class, and weight. We present a concept projection approach to capture the valence subspace through contextualized word embeddings of language models. Adapting the projection-based approach to embedding association tests that quantify bias, we find that language models exhibit the most biased attitudes against gender identity, social class, and sexual orientation signals in language. We find that the largest and better-performing model that we study is also more biased as it effectively captures bias embedded in sociocultural data. We validate the bias evaluation method by overperforming on an intrinsic valence evaluation task. The approach enables us to measure complex intersectional biases as they are known to manifest in the outputs and applications of language models that perpetuate historical biases. Moreover, our approach contributes to design justice as it studies the associations of groups underrepresented in language such as transgender and homosexual individuals.
    Merging-Diverging Hybrid Transformer Networks for Survival Prediction in Head and Neck Cancer. (arXiv:2307.03427v1 [eess.IV])
    Survival prediction is crucial for cancer patients as it provides early prognostic information for treatment planning. Recently, deep survival models based on deep learning and medical images have shown promising performance for survival prediction. However, existing deep survival models are not well developed in utilizing multi-modality images (e.g., PET-CT) and in extracting region-specific information (e.g., the prognostic information in Primary Tumor (PT) and Metastatic Lymph Node (MLN) regions). In view of this, we propose a merging-diverging learning framework for survival prediction from multi-modality images. This framework has a merging encoder to fuse multi-modality information and a diverging decoder to extract region-specific information. In the merging encoder, we propose a Hybrid Parallel Cross-Attention (HPCA) block to effectively fuse multi-modality features via parallel convolutional layers and cross-attention transformers. In the diverging decoder, we propose a Region-specific Attention Gate (RAG) block to screen out the features related to lesion regions. Our framework is demonstrated on survival prediction from PET-CT images in Head and Neck (H&N) cancer, by designing an X-shape merging-diverging hybrid transformer network (named XSurv). Our XSurv combines the complementary information in PET and CT images and extracts the region-specific prognostic information in PT and MLN regions. Extensive experiments on the public dataset of HEad and neCK TumOR segmentation and outcome prediction challenge (HECKTOR 2022) demonstrate that our XSurv outperforms state-of-the-art survival prediction methods.
    Distilled Pruning: Using Synthetic Data to Win the Lottery. (arXiv:2307.03364v1 [cs.LG])
    This work introduces a novel approach to pruning deep learning models by using distilled data. Unlike conventional strategies which primarily focus on architectural or algorithmic optimization, our method reconsiders the role of data in these scenarios. Distilled datasets capture essential patterns from larger datasets, and we demonstrate how to leverage this capability to enable a computationally efficient pruning process. Our approach can find sparse, trainable subnetworks (a.k.a. Lottery Tickets) up to 5x faster than Iterative Magnitude Pruning at comparable sparsity on CIFAR-10. The experimental results highlight the potential of using distilled data for resource-efficient neural network pruning, model compression, and neural architecture search.
    DWReCO at CheckThat! 2023: Enhancing Subjectivity Detection through Style-based Data Sampling. (arXiv:2307.03550v1 [cs.CL])
    This paper describes our submission for the subjectivity detection task at the CheckThat! Lab. To tackle class imbalances in the task, we have generated additional training materials with GPT-3 models using prompts of different styles from a subjectivity checklist based on journalistic perspective. We used the extended training set to fine-tune language-specific transformer models. Our experiments in English, German and Turkish demonstrate that different subjective styles are effective across all languages. In addition, we observe that the style-based oversampling is better than paraphrasing in Turkish and English. Lastly, the GPT-3 models sometimes produce lacklustre results when generating style-based texts in non-English languages.
    Steel Surface Roughness Parameter Calculations Using Lasers and Machine Learning Models. (arXiv:2307.03723v1 [cs.LG])
    Control of surface texture in strip steel is essential to meet customer requirements during galvanizing and temper rolling processes. Traditional methods rely on post-production stylus measurements, while on-line techniques offer non-contact and real-time measurements of the entire strip. However, ensuring accurate measurement is imperative for their effective utilization in the manufacturing pipeline. Moreover, accurate on-line measurements enable real-time adjustments of manufacturing processing parameters during production, ensuring consistent quality and the possibility of closed-loop control of the temper mill. In this study, we leverage state-of-the-art machine learning models to enhance the transformation of on-line measurements into significantly a more accurate Ra surface roughness metric. By comparing a selection of data-driven approaches, including both deep learning and non-deep learning methods, to the close-form transformation, we evaluate their potential for improving surface texture control in temper strip steel manufacturing.
    Roman Numeral Analysis with Graph Neural Networks: Onset-wise Predictions from Note-wise Features. (arXiv:2307.03544v1 [cs.SD])
    Roman Numeral analysis is the important task of identifying chords and their functional context in pieces of tonal music. This paper presents a new approach to automatic Roman Numeral analysis in symbolic music. While existing techniques rely on an intermediate lossy representation of the score, we propose a new method based on Graph Neural Networks (GNNs) that enable the direct description and processing of each individual note in the score. The proposed architecture can leverage notewise features and interdependencies between notes but yield onset-wise representation by virtue of our novel edge contraction algorithm. Our results demonstrate that ChordGNN outperforms existing state-of-the-art models, achieving higher accuracy in Roman Numeral analysis on the reference datasets. In addition, we investigate variants of our model using proposed techniques such as NADE, and post-processing of the chord predictions. The full source code for this work is available at https://github.com/manoskary/chordgnn
    Learning from Heterogeneity: A Dynamic Learning Framework for Hypergraphs. (arXiv:2307.03411v1 [cs.LG])
    Graph neural network (GNN) has gained increasing popularity in recent years owing to its capability and flexibility in modeling complex graph structure data. Among all graph learning methods, hypergraph learning is a technique for exploring the implicit higher-order correlations when training the embedding space of the graph. In this paper, we propose a hypergraph learning framework named LFH that is capable of dynamic hyperedge construction and attentive embedding update utilizing the heterogeneity attributes of the graph. Specifically, in our framework, the high-quality features are first generated by the pairwise fusion strategy that utilizes explicit graph structure information when generating initial node embedding. Afterwards, a hypergraph is constructed through the dynamic grouping of implicit hyperedges, followed by the type-specific hypergraph learning process. To evaluate the effectiveness of our proposed framework, we conduct comprehensive experiments on several popular datasets with eleven state-of-the-art models on both node classification and link prediction tasks, which fall into categories of homogeneous pairwise graph learning, heterogeneous pairwise graph learning, and hypergraph learning. The experiment results demonstrate a significant performance gain (average 12.5% in node classification and 13.3% in link prediction) compared with recent state-of-the-art methods.
    Region-Wise Attentive Multi-View Representation Learning for Urban Region Embeddings. (arXiv:2307.03212v1 [cs.CV])
    Urban region embedding is an important and yet highly challenging issue due to the complexity and constantly changing nature of urban data. To address the challenges, we propose a Region-Wise Multi-View Representation Learning (ROMER) to capture multi-view dependencies and learn expressive representations of urban regions without the constraints of rigid neighbourhood region conditions. Our model focus on learn urban region representation from multi-source urban data. First, we capture the multi-view correlations from mobility flow patterns, POI semantics and check-in dynamics. Then, we adopt global graph attention networks to learn similarity of any two vertices in graphs. To comprehensively consider and share features of multiple views, a two-stage fusion module is further proposed to learn weights with external attention to fuse multi-view embeddings. Extensive experiments for two downstream tasks on real-world datasets demonstrate that our model outperforms state-of-the-art methods by up to 17\% improvement.
    DEFT: Exploiting Gradient Norm Difference between Model Layers for Scalable Gradient Sparsification. (arXiv:2307.03500v1 [cs.LG])
    Gradient sparsification is a widely adopted solution for reducing the excessive communication traffic in distributed deep learning. However, most existing gradient sparsifiers have relatively poor scalability because of considerable computational cost of gradient selection and/or increased communication traffic owing to gradient build-up. To address these challenges, we propose a novel gradient sparsification scheme, DEFT, that partitions the gradient selection task into sub tasks and distributes them to workers. DEFT differs from existing sparsifiers, wherein every worker selects gradients among all gradients. Consequently, the computational cost can be reduced as the number of workers increases. Moreover, gradient build-up can be eliminated because DEFT allows workers to select gradients in partitions that are non-intersecting (between workers). Therefore, even if the number of workers increases, the communication traffic can be maintained as per user requirement. To avoid the loss of significance of gradient selection, DEFT selects more gradients in the layers that have a larger gradient norm than the other layers. Because every layer has a different computational load, DEFT allocates layers to workers using a bin-packing algorithm to maintain a balanced load of gradient selection between workers. In our empirical evaluation, DEFT shows a significant improvement in training performance in terms of speed in gradient selection over existing sparsifiers while achieving high convergence performance.
    Sparse Graphical Linear Dynamical Systems. (arXiv:2307.03210v1 [cs.LG])
    Time-series datasets are central in numerous fields of science and engineering, such as biomedicine, Earth observation, and network analysis. Extensive research exists on state-space models (SSMs), which are powerful mathematical tools that allow for probabilistic and interpretable learning on time series. Estimating the model parameters in SSMs is arguably one of the most complicated tasks, and the inclusion of prior knowledge is known to both ease the interpretation but also to complicate the inferential tasks. Very recent works have attempted to incorporate a graphical perspective on some of those model parameters, but they present notable limitations that this work addresses. More generally, existing graphical modeling tools are designed to incorporate either static information, focusing on statistical dependencies among independent random variables (e.g., graphical Lasso approach), or dynamic information, emphasizing causal relationships among time series samples (e.g., graphical Granger approaches). However, there are no joint approaches combining static and dynamic graphical modeling within the context of SSMs. This work proposes a novel approach to fill this gap by introducing a joint graphical modeling framework that bridges the static graphical Lasso model and a causal-based graphical approach for the linear-Gaussian SSM. We present DGLASSO (Dynamic Graphical Lasso), a new inference method within this framework that implements an efficient block alternating majorization-minimization algorithm. The algorithm's convergence is established by departing from modern tools from nonlinear analysis. Experimental validation on synthetic and real weather variability data showcases the effectiveness of the proposed model and inference algorithm.  ( 2 min )
    Discovering Hierarchical Achievements in Reinforcement Learning via Contrastive Learning. (arXiv:2307.03486v1 [cs.LG])
    Discovering achievements with a hierarchical structure on procedurally generated environments poses a significant challenge. This requires agents to possess a broad range of abilities, including generalization and long-term reasoning. Many prior methods are built upon model-based or hierarchical approaches, with the belief that an explicit module for long-term planning would be beneficial for learning hierarchical achievements. However, these methods require an excessive amount of environment interactions or large model sizes, limiting their practicality. In this work, we identify that proximal policy optimization (PPO), a simple and versatile model-free algorithm, outperforms the prior methods with recent implementation practices. Moreover, we find that the PPO agent can predict the next achievement to be unlocked to some extent, though with low confidence. Based on this observation, we propose a novel contrastive learning method, called achievement distillation, that strengthens the agent's capability to predict the next achievement. Our method exhibits a strong capacity for discovering hierarchical achievements and shows state-of-the-art performance on the challenging Crafter environment using fewer model parameters in a sample-efficient regime.  ( 2 min )
    Differential Privacy for Clustering Under Continual Observation. (arXiv:2307.03430v1 [cs.DS])
    We consider the problem of clustering privately a dataset in $\mathbb{R}^d$ that undergoes both insertion and deletion of points. Specifically, we give an $\varepsilon$-differentially private clustering mechanism for the $k$-means objective under continual observation. This is the first approximation algorithm for that problem with an additive error that depends only logarithmically in the number $T$ of updates. The multiplicative error is almost the same as non privately. To do so we show how to perform dimension reduction under continual observation and combine it with a differentially private greedy approximation algorithm for $k$-means. We also partially extend our results to the $k$-median problem.  ( 2 min )
    Machine Learning to detect cyber-attacks and discriminating the types of power system disturbances. (arXiv:2307.03323v1 [cs.LG])
    This research proposes a machine learning-based attack detection model for power systems, specifically targeting smart grids. By utilizing data and logs collected from Phasor Measuring Devices (PMUs), the model aims to learn system behaviors and effectively identify potential security boundaries. The proposed approach involves crucial stages including dataset pre-processing, feature selection, model creation, and evaluation. To validate our approach, we used a dataset used, consist of 15 separate datasets obtained from different PMUs, relay snort alarms and logs. Three machine learning models: Random Forest, Logistic Regression, and K-Nearest Neighbour were built and evaluated using various performance metrics. The findings indicate that the Random Forest model achieves the highest performance with an accuracy of 90.56% in detecting power system disturbances and has the potential in assisting operators in decision-making processes.  ( 2 min )
    Quantification of Uncertainty with Adversarial Models. (arXiv:2307.03217v1 [cs.LG])
    Quantifying uncertainty is important for actionable predictions in real-world applications. A crucial part of predictive uncertainty quantification is the estimation of epistemic uncertainty, which is defined as an integral of the product between a divergence function and the posterior. Current methods such as Deep Ensembles or MC dropout underperform at estimating the epistemic uncertainty, since they primarily consider the posterior when sampling models. We suggest Quantification of Uncertainty with Adversarial Models (QUAM) to better estimate the epistemic uncertainty. QUAM identifies regions where the whole product under the integral is large, not just the posterior. Consequently, QUAM has lower approximation error of the epistemic uncertainty compared to previous methods. Models for which the product is large correspond to adversarial models (not adversarial examples!). Adversarial models have both a high posterior as well as a high divergence between their predictions and that of a reference model. Our experiments show that QUAM excels in capturing epistemic uncertainty for deep learning models and outperforms previous methods on challenging tasks in the vision domain.  ( 2 min )
    Scalable High-Dimensional Multivariate Linear Regression for Feature-Distributed Data. (arXiv:2307.03410v1 [stat.ML])
    Feature-distributed data, referred to data partitioned by features and stored across multiple computing nodes, are increasingly common in applications with a large number of features. This paper proposes a two-stage relaxed greedy algorithm (TSRGA) for applying multivariate linear regression to such data. The main advantage of TSRGA is that its communication complexity does not depend on the feature dimension, making it highly scalable to very large data sets. In addition, for multivariate response variables, TSRGA can be used to yield low-rank coefficient estimates. The fast convergence of TSRGA is validated by simulation experiments. Finally, we apply the proposed TSRGA in a financial application that leverages unstructured data from the 10-K reports, demonstrating its usefulness in applications with many dense large-dimensional matrices.  ( 2 min )
    ITA: An Energy-Efficient Attention and Softmax Accelerator for Quantized Transformers. (arXiv:2307.03493v1 [cs.AR])
    Transformer networks have emerged as the state-of-the-art approach for natural language processing tasks and are gaining popularity in other domains such as computer vision and audio processing. However, the efficient hardware acceleration of transformer models poses new challenges due to their high arithmetic intensities, large memory requirements, and complex dataflow dependencies. In this work, we propose ITA, a novel accelerator architecture for transformers and related models that targets efficient inference on embedded systems by exploiting 8-bit quantization and an innovative softmax implementation that operates exclusively on integer values. By computing on-the-fly in streaming mode, our softmax implementation minimizes data movement and energy consumption. ITA achieves competitive energy efficiency with respect to state-of-the-art transformer accelerators with 16.9 TOPS/W, while outperforming them in area efficiency with 5.93 TOPS/mm$^2$ in 22 nm fully-depleted silicon-on-insulator technology at 0.8 V.  ( 2 min )
    Personalized Prediction of Recurrent Stress Events Using Self-Supervised Learning on Multimodal Time-Series Data. (arXiv:2307.03337v1 [cs.LG])
    Chronic stress can significantly affect physical and mental health. The advent of wearable technology allows for the tracking of physiological signals, potentially leading to innovative stress prediction and intervention methods. However, challenges such as label scarcity and data heterogeneity render stress prediction difficult in practice. To counter these issues, we have developed a multimodal personalized stress prediction system using wearable biosignal data. We employ self-supervised learning (SSL) to pre-train the models on each subject's data, allowing the models to learn the baseline dynamics of the participant's biosignals prior to fine-tuning the stress prediction task. We test our model on the Wearable Stress and Affect Detection (WESAD) dataset, demonstrating that our SSL models outperform non-SSL models while utilizing less than 5% of the annotations. These results suggest that our approach can personalize stress prediction to each user with minimal annotations. This paradigm has the potential to enable personalized prediction of a variety of recurring health events using complex multimodal data streams.  ( 2 min )
    Goal-Conditioned Predictive Coding as an Implicit Planner for Offline Reinforcement Learning. (arXiv:2307.03406v1 [cs.LG])
    Recent work has demonstrated the effectiveness of formulating decision making as a supervised learning problem on offline-collected trajectories. However, the benefits of performing sequence modeling on trajectory data is not yet clear. In this work we investigate if sequence modeling has the capability to condense trajectories into useful representations that can contribute to policy learning. To achieve this, we adopt a two-stage framework that first summarizes trajectories with sequence modeling techniques, and then employs these representations to learn a policy along with a desired goal. This design allows many existing supervised offline RL methods to be considered as specific instances of our framework. Within this framework, we introduce Goal-Conditioned Predicitve Coding (GCPC), an approach that brings powerful trajectory representations and leads to performant policies. We conduct extensive empirical evaluations on AntMaze, FrankaKitchen and Locomotion environments, and observe that sequence modeling has a significant impact on some decision making tasks. In addition, we demonstrate that GCPC learns a goal-conditioned latent representation about the future, which serves as an "implicit planner", and enables competitive performance on all three benchmarks.  ( 2 min )
    Scalable Multi-Agent Reinforcement Learning for Warehouse Logistics with Robotic and Human Co-Workers. (arXiv:2212.11498v2 [cs.LG] UPDATED)
    We envision a warehouse in which dozens of mobile robots and human pickers work together to collect and deliver items within the warehouse. The fundamental problem we tackle, called the order-picking problem, is how these worker agents must coordinate their movement and actions in the warehouse to maximise performance (e.g. order throughput). Established industry methods using heuristic approaches require large engineering efforts to optimise for innately variable warehouse configurations. In contrast, multi-agent reinforcement learning (MARL) can be flexibly applied to diverse warehouse configurations (e.g. size, layout, number/types of workers, item replenishment frequency), as the agents learn through experience how to optimally cooperate with one another. We develop hierarchical MARL algorithms in which a manager assigns goals to worker agents, and the policies of the manager and workers are co-trained toward maximising a global objective (e.g. pick rate). Our hierarchical algorithms achieve significant gains in sample efficiency and overall pick rates over baseline MARL algorithms in diverse warehouse configurations, and substantially outperform two established industry heuristics for order-picking systems.  ( 2 min )
    Robust Tickets Can Transfer Better: Drawing More Transferable Subnetworks in Transfer Learning. (arXiv:2304.11834v2 [cs.LG] UPDATED)
    Transfer learning leverages feature representations of deep neural networks (DNNs) pretrained on source tasks with rich data to empower effective finetuning on downstream tasks. However, the pretrained models are often prohibitively large for delivering generalizable representations, which limits their deployment on edge devices with constrained resources. To close this gap, we propose a new transfer learning pipeline, which leverages our finding that robust tickets can transfer better, i.e., subnetworks drawn with properly induced adversarial robustness can win better transferability over vanilla lottery ticket subnetworks. Extensive experiments and ablation studies validate that our proposed transfer learning pipeline can achieve enhanced accuracy-sparsity trade-offs across both diverse downstream tasks and sparsity patterns, further enriching the lottery ticket hypothesis.  ( 2 min )
    Harmonizing Feature Attributions Across Deep Learning Architectures: Enhancing Interpretability and Consistency. (arXiv:2307.02150v2 [cs.LG] UPDATED)
    Ensuring the trustworthiness and interpretability of machine learning models is critical to their deployment in real-world applications. Feature attribution methods have gained significant attention, which provide local explanations of model predictions by attributing importance to individual input features. This study examines the generalization of feature attributions across various deep learning architectures, such as convolutional neural networks (CNNs) and vision transformers. We aim to assess the feasibility of utilizing a feature attribution method as a future detector and examine how these features can be harmonized across multiple models employing distinct architectures but trained on the same data distribution. By exploring this harmonization, we aim to develop a more coherent and optimistic understanding of feature attributions, enhancing the consistency of local explanations across diverse deep-learning models. Our findings highlight the potential for harmonized feature attribution methods to improve interpretability and foster trust in machine learning applications, regardless of the underlying architecture.  ( 2 min )
    Language Models are Bounded Pragmatic Speakers: Understanding RLHF from a Bayesian Cognitive Modeling Perspective. (arXiv:2305.17760v4 [cs.CL] UPDATED)
    How do language models "think"? This paper formulates a probabilistic cognitive model called the bounded pragmatic speaker, which can characterize the operation of different variations of language models. Specifically, we demonstrate that large language models fine-tuned with reinforcement learning from human feedback (Ouyang et al., 2022) embody a model of thought that conceptually resembles a fast-and-slow model (Kahneman, 2011), which psychologists have attributed to humans. We discuss the limitations of reinforcement learning from human feedback as a fast-and-slow model of thought and propose avenues for expanding this framework. In essence, our research highlights the value of adopting a cognitive probabilistic modeling approach to gain insights into the comprehension, evaluation, and advancement of language models.  ( 2 min )
    Learning Homogenization for Elliptic Operators. (arXiv:2306.12006v2 [math.NA] UPDATED)
    Multiscale partial differential equations (PDEs) arise in various applications, and several schemes have been developed to solve them efficiently. Homogenization theory is a powerful methodology that eliminates the small-scale dependence, resulting in simplified equations that are computationally tractable. In the field of continuum mechanics, homogenization is crucial for deriving constitutive laws that incorporate microscale physics in order to formulate balance laws for the macroscopic quantities of interest. However, obtaining homogenized constitutive laws is often challenging as they do not in general have an analytic form and can exhibit phenomena not present on the microscale. In response, data-driven learning of the constitutive law has been proposed as appropriate for this task. However, a major challenge in data-driven learning approaches for this problem has remained unexplored: the impact of discontinuities and corner interfaces in the underlying material. These discontinuities in the coefficients affect the smoothness of the solutions of the underlying equations. Given the prevalence of discontinuous materials in continuum mechanics applications, it is important to address the challenge of learning in this context; in particular to develop underpinning theory to establish the reliability of data-driven methods in this scientific domain. The paper addresses this unexplored challenge by investigating the learnability of homogenized constitutive laws for elliptic operators in the presence of such complexities. Approximation theory is presented, and numerical experiments are performed which validate the theory for the solution operator defined by the cell-problem arising in homogenization for elliptic PDEs.  ( 3 min )
    Adaptive Strategies in Non-convex Optimization. (arXiv:2306.10278v2 [cs.LG] UPDATED)
    An algorithm is said to be adaptive to a certain parameter (of the problem) if it does not need a priori knowledge of such a parameter but performs competitively to those that know it. This dissertation presents our work on adaptive algorithms in following scenarios: 1. In the stochastic optimization setting, we only receive stochastic gradients and the level of noise in evaluating them greatly affects the convergence rate. Tuning is typically required when without prior knowledge of the noise scale in order to achieve the optimal rate. Considering this, we designed and analyzed noise-adaptive algorithms that can automatically ensure (near)-optimal rates under different noise scales without knowing it. 2. In training deep neural networks, the scales of gradient magnitudes in each coordinate can scatter across a very wide range unless normalization techniques, like BatchNorm, are employed. In such situations, algorithms not addressing this problem of gradient scales can behave very poorly. To mitigate this, we formally established the advantage of scale-free algorithms that adapt to the gradient scales and presented its real benefits in empirical experiments. 3. Traditional analyses in non-convex optimization typically rely on the smoothness assumption. Yet, this condition does not capture the properties of some deep learning objective functions, including the ones involving Long Short-Term Memory networks and Transformers. Instead, they satisfy a much more relaxed condition, with potentially unbounded smoothness. Under this condition, we show that a generalized SignSGD algorithm can theoretically match the best-known convergence rates obtained by SGD with gradient clipping but does not need explicit clipping at all, and it can empirically match the performance of Adam and beat others. Moreover, it can also be made to automatically adapt to the unknown relaxed smoothness.  ( 3 min )
    Offline Prioritized Experience Replay. (arXiv:2306.05412v2 [cs.LG] UPDATED)
    Offline reinforcement learning (RL) is challenged by the distributional shift problem. To address this problem, existing works mainly focus on designing sophisticated policy constraints between the learned policy and the behavior policy. However, these constraints are applied equally to well-performing and inferior actions through uniform sampling, which might negatively affect the learned policy. To alleviate this issue, we propose Offline Prioritized Experience Replay (OPER), featuring a class of priority functions designed to prioritize highly-rewarding transitions, making them more frequently visited during training. Through theoretical analysis, we show that this class of priority functions induce an improved behavior policy, and when constrained to this improved policy, a policy-constrained offline RL algorithm is likely to yield a better solution. We develop two practical strategies to obtain priority weights by estimating advantages based on a fitted value network (OPER-A) or utilizing trajectory returns (OPER-R) for quick computation. OPER is a plug-and-play component for offline RL algorithms. As case studies, we evaluate OPER on five different algorithms, including BC, TD3+BC, Onestep RL, CQL, and IQL. Extensive experiments demonstrate that both OPER-A and OPER-R significantly improve the performance for all baseline methods. Codes and priority weights are availiable at https://github.com/sail-sg/OPER.  ( 2 min )
    Multiclass Online Learning and Uniform Convergence. (arXiv:2303.17716v2 [cs.LG] UPDATED)
    We study multiclass classification in the agnostic adversarial online learning setting. As our main result, we prove that any multiclass concept class is agnostically learnable if and only if its Littlestone dimension is finite. This solves an open problem studied by Daniely, Sabato, Ben-David, and Shalev-Shwartz (2011,2015) who handled the case when the number of classes (or labels) is bounded. We also prove a separation between online learnability and online uniform convergence by exhibiting an easy-to-learn class whose sequential Rademacher complexity is unbounded. Our learning algorithm uses the multiplicative weights algorithm, with a set of experts defined by executions of the Standard Optimal Algorithm on subsequences of size Littlestone dimension. We argue that the best expert has regret at most Littlestone dimension relative to the best concept in the class. This differs from the well-known covering technique of Ben-David, P\'{a}l, and Shalev-Shwartz (2009) for binary classification, where the best expert has regret zero.  ( 2 min )
    Highly Accurate Quantum Chemical Property Prediction with Uni-Mol+. (arXiv:2303.16982v2 [physics.chem-ph] UPDATED)
    Recent developments in deep learning have made remarkable progress in speeding up the prediction of quantum chemical (QC) properties by removing the need for expensive electronic structure calculations like density functional theory. However, previous methods learned from 1D SMILES sequences or 2D molecular graphs failed to achieve high accuracy as QC properties primarily depend on the 3D equilibrium conformations optimized by electronic structure methods, far different from the sequence-type and graph-type data. In this paper, we propose a novel approach called Uni-Mol+ to tackle this challenge. Uni-Mol+ first generates a raw 3D molecule conformation from inexpensive methods such as RDKit. Then, the raw conformation is iteratively updated to its target DFT equilibrium conformation using neural networks, and the learned conformation will be used to predict the QC properties. To effectively learn this update process towards the equilibrium conformation, we introduce a two-track Transformer model backbone and train it with the QC property prediction task. We also design a novel approach to guide the model's training process. Our extensive benchmarking results demonstrate that the proposed Uni-Mol+ significantly improves the accuracy of QC property prediction in various datasets. We have made the code and model publicly available at \url{https://github.com/dptech-corp/Uni-Mol}.  ( 2 min )
    Identifying Patient-Specific Root Causes with the Heteroscedastic Noise Model. (arXiv:2205.13085v2 [stat.ML] UPDATED)
    Complex diseases are caused by a multitude of factors that may differ between patients even within the same diagnostic category. A few underlying root causes may nevertheless initiate the development of disease within each patient. We therefore focus on identifying patient-specific root causes of disease, which we equate to the sample-specific predictivity of the exogenous error terms in a structural equation model. We generalize from the linear setting to the heteroscedastic noise model where $Y = m(X) + \varepsilon\sigma(X)$ with non-linear functions $m(X)$ and $\sigma(X)$ representing the conditional mean and mean absolute deviation, respectively. This model preserves identifiability but introduces non-trivial challenges that require a customized algorithm called Generalized Root Causal Inference (GRCI) to extract the error terms correctly. GRCI recovers patient-specific root causes more accurately than existing alternatives.  ( 2 min )
    On discrete symmetries of robotics systems: A group-theoretic and data-driven analysis. (arXiv:2302.10433v3 [cs.RO] UPDATED)
    We present a comprehensive study on discrete morphological symmetries of dynamical systems, which are commonly observed in biological and artificial locomoting systems, such as legged, swimming, and flying animals/robots/virtual characters. These symmetries arise from the presence of one or more planes/axis of symmetry in the system's morphology, resulting in harmonious duplication and distribution of body parts. Significantly, we characterize how morphological symmetries extend to symmetries in the system's dynamics, optimal control policies, and in all proprioceptive and exteroceptive measurements related to the system's dynamics evolution. In the context of data-driven methods, symmetry represents an inductive bias that justifies the use of data augmentation or symmetric function approximators. To tackle this, we present a theoretical and practical framework for identifying the system's morphological symmetry group $\G$ and characterizing the symmetries in proprioceptive and exteroceptive data measurements. We then exploit these symmetries using data augmentation and $\G$-equivariant neural networks. Our experiments on both synthetic and real-world applications provide empirical evidence of the advantageous outcomes resulting from the exploitation of these symmetries, including improved sample efficiency, enhanced generalization, and reduction of trainable parameters.  ( 2 min )
    RObotic MAnipulation Network (ROMAN) $\unicode{x2013}$ Hybrid Hierarchical Learning for Solving Complex Sequential Tasks. (arXiv:2307.00125v2 [cs.RO] UPDATED)
    Solving long sequential tasks poses a significant challenge in embodied artificial intelligence. Enabling a robotic system to perform diverse sequential tasks with a broad range of manipulation skills is an active area of research. In this work, we present a Hybrid Hierarchical Learning framework, the Robotic Manipulation Network (ROMAN), to address the challenge of solving multiple complex tasks over long time horizons in robotic manipulation. ROMAN achieves task versatility and robust failure recovery by integrating behavioural cloning, imitation learning, and reinforcement learning. It consists of a central manipulation network that coordinates an ensemble of various neural networks, each specialising in distinct re-combinable sub-tasks to generate their correct in-sequence actions for solving complex long-horizon manipulation tasks. Experimental results show that by orchestrating and activating these specialised manipulation experts, ROMAN generates correct sequential activations for accomplishing long sequences of sophisticated manipulation tasks and achieving adaptive behaviours beyond demonstrations, while exhibiting robustness to various sensory noises. These results demonstrate the significance and versatility of ROMAN's dynamic adaptability featuring autonomous failure recovery capabilities, and highlight its potential for various autonomous manipulation tasks that demand adaptive motor skills.  ( 2 min )
    Federated Variational Inference Methods for Structured Latent Variable Models. (arXiv:2302.03314v2 [stat.ML] UPDATED)
    Federated learning methods enable model training across distributed data sources without data leaving their original locations and have gained increasing interest in various fields. However, existing approaches are limited, excluding many structured probabilistic models. We present a general and elegant solution based on structured variational inference, widely used in Bayesian machine learning, adapted for the federated setting. Additionally, we provide a communication-efficient variant analogous to the canonical FedAvg algorithm. The proposed algorithms' effectiveness is demonstrated, and their performance is compared with hierarchical Bayesian neural networks and topic models.  ( 2 min )
    Reward-Respecting Subtasks for Model-Based Reinforcement Learning. (arXiv:2202.03466v3 [cs.LG] UPDATED)
    To achieve the ambitious goals of artificial intelligence, reinforcement learning must include planning with a model of the world that is abstract in state and time. Deep learning has made progress with state abstraction, but temporal abstraction has rarely been used, despite extensively developed theory based on the options framework. One reason for this is that the space of possible options is immense, and the methods previously proposed for option discovery do not take into account how the option models will be used in planning. Options are typically discovered by posing subsidiary tasks, such as reaching a bottleneck state or maximizing the cumulative sum of a sensory signal other than reward. Each subtask is solved to produce an option, and then a model of the option is learned and made available to the planning process. In most previous work, the subtasks ignore the reward on the original problem, whereas we propose subtasks that use the original reward plus a bonus based on a feature of the state at the time the option terminates. We show that option models obtained from such reward-respecting subtasks are much more likely to be useful in planning than eigenoptions, shortest path options based on bottleneck states, or reward-respecting options generated by the option-critic. Reward respecting subtasks strongly constrain the space of options and thereby also provide a partial solution to the problem of option discovery. Finally, we show how values, policies, options, and models can all be learned online and off-policy using standard algorithms and general value functions.  ( 3 min )
    k-strip: A novel segmentation algorithm in k-space for the application of skull stripping. (arXiv:2205.09706v2 [eess.IV] UPDATED)
    Objectives: Present a novel deep learning-based skull stripping algorithm for magnetic resonance imaging (MRI) that works directly in the information rich k-space. Materials and Methods: Using two datasets from different institutions with a total of 36,900 MRI slices, we trained a deep learning-based model to work directly with the complex raw k-space data. Skull stripping performed by HD-BET (Brain Extraction Tool) in the image domain were used as the ground truth. Results: Both datasets were very similar to the ground truth (DICE scores of 92\%-98\% and Hausdorff distances of under 5.5 mm). Results on slices above the eye-region reach DICE scores of up to 99\%, while the accuracy drops in regions around the eyes and below, with partially blurred output. The output of k-strip often smoothed edges at the demarcation to the skull. Binary masks are created with an appropriate threshold. Conclusion: With this proof-of-concept study, we were able to show the feasibility of working in the k-space frequency domain, preserving phase information, with consistent results. Future research should be dedicated to discovering additional ways the k-space can be used for innovative image analysis and further workflows.  ( 3 min )
    Avoiding Post-Processing with Event-Based Detection in Biomedical Signals. (arXiv:2209.11007v2 [eess.SP] UPDATED)
    Objective: Finding events of interest is a common task in biomedical signal processing. The detection of epileptic seizures and signal artefacts are two key examples. Epoch-based classification is the typical machine learning framework to detect such signal events because of the straightforward application of classical machine learning techniques. Usually, post-processing is required to achieve good performance and enforce temporal dependencies. Designing the right post-processing scheme to convert these classification outputs into events is a tedious, and labor-intensive element of this framework. Methods: We propose an event-based modeling framework that directly works with events as learning targets, stepping away from ad-hoc post-processing schemes to turn model outputs into events. We illustrate the practical power of this framework on simulated data and real-world data, comparing it to epoch-based modeling approaches. Results: We show that event-based modeling (without post-processing) performs on par with or better than epoch-based modeling with extensive post-processing. Conclusion: These results show the power of treating events as direct learning targets, instead of using ad-hoc post-processing to obtain them, severely reducing design effort. Significance: The event-based modeling framework can easily be applied to other event detection problems in signal processing, removing the need for intensive task-specific post-processing.  ( 2 min )
    Stability Analysis Framework for Particle-based Distance GANs with Wasserstein Gradient Flow. (arXiv:2307.01879v2 [cs.LG] UPDATED)
    In this paper, we investigate the training process of generative networks that use a type of probability density distance named particle-based distance as the objective function, e.g. MMD GAN, Cram\'er GAN, EIEG GAN. However, these GANs often suffer from the problem of unstable training. In this paper, we analyze the stability of the training process of these GANs from the perspective of probability density dynamics. In our framework, we regard the discriminator $D$ in these GANs as a feature transformation mapping that maps high dimensional data into a feature space, while the generator $G$ maps random variables to samples that resemble real data in terms of feature space. This perspective enables us to perform stability analysis for the training of GANs using the Wasserstein gradient flow of the probability density function. We find that the training process of the discriminator is usually unstable due to the formulation of $\min_G \max_D E(G, D)$ in GANs. To address this issue, we add a stabilizing term in the discriminator loss function. We conduct experiments to validate our stability analysis and stabilizing method.  ( 2 min )
    Evaluating Similitude and Robustness of Deep Image Denoising Models via Adversarial Attack. (arXiv:2306.16050v2 [cs.CV] UPDATED)
    Deep neural networks (DNNs) have shown superior performance comparing to traditional image denoising algorithms. However, DNNs are inevitably vulnerable while facing adversarial attacks. In this paper, we propose an adversarial attack method named denoising-PGD which can successfully attack all the current deep denoising models while keep the noise distribution almost unchanged. We surprisingly find that the current mainstream non-blind denoising models (DnCNN, FFDNet, ECNDNet, BRDNet), blind denoising models (DnCNN-B, Noise2Noise, RDDCNN-B, FAN), plug-and-play (DPIR, CurvPnP) and unfolding denoising models (DeamNet) almost share the same adversarial sample set on both grayscale and color images, respectively. Shared adversarial sample set indicates that all these models are similar in term of local behaviors at the neighborhood of all the test samples. Thus, we further propose an indicator to measure the local similarity of models, called robustness similitude. Non-blind denoising models are found to have high robustness similitude across each other, while hybrid-driven models are also found to have high robustness similitude with pure data-driven non-blind denoising models. According to our robustness assessment, data-driven non-blind denoising models are the most robust. We use adversarial training to complement the vulnerability to adversarial attacks. Moreover, the model-driven image denoising BM3D shows resistance on adversarial attacks.  ( 2 min )
    A Memetic Algorithm with Reinforcement Learning for Sociotechnical Production Scheduling. (arXiv:2212.10936v4 [cs.LG] UPDATED)
    The following interdisciplinary article presents a memetic algorithm with applying deep reinforcement learning (DRL) for solving practically oriented dual resource constrained flexible job shop scheduling problems (DRC-FJSSP). From research projects in industry, we recognize the need to consider flexible machines, flexible human workers, worker capabilities, setup and processing operations, material arrival times, complex job paths with parallel tasks for bill of material (BOM) manufacturing, sequence-dependent setup times and (partially) automated tasks in human-machine-collaboration. In recent years, there has been extensive research on metaheuristics and DRL techniques but focused on simple scheduling environments. However, there are few approaches combining metaheuristics and DRL to generate schedules more reliably and efficiently. In this paper, we first formulate a DRC-FJSSP to map complex industry requirements beyond traditional job shop models. Then we propose a scheduling framework integrating a discrete event simulation (DES) for schedule evaluation, considering parallel computing and multicriteria optimization. Here, a memetic algorithm is enriched with DRL to improve sequencing and assignment decisions. Through numerical experiments with real-world production data, we confirm that the framework generates feasible schedules efficiently and reliably for a balanced optimization of makespan (MS) and total tardiness (TT). Utilizing DRL instead of random metaheuristic operations leads to better results in fewer algorithm iterations and outperforms traditional approaches in such complex environments.  ( 3 min )
    Bicriteria Approximation Algorithms for Priority Matroid Median. (arXiv:2210.01888v2 [cs.DS] UPDATED)
    Fairness considerations have motivated new clustering problems and algorithms in recent years. In this paper we consider the Priority Matroid Median problem which generalizes the Priority $k$-Median problem that has recently been studied. The input consists of a set of facilities $\mathcal{F}$ and a set of clients $\mathcal{C}$ that lie in a metric space $(\mathcal{F} \cup \mathcal{C},d)$, and a matroid $\mathcal{M}=(\mathcal{F},\mathcal{I})$ over the facilities. In addition each client $j$ has a specified radius $r_j \ge 0$ and each facility $i \in \mathcal{F}$ has an opening cost $f_i$. The goal is to choose a subset $S \subseteq \mathcal{F}$ of facilities to minimize the $\sum_{i \in \mathcal{F}} f_i + \sum_{j \in \mathcal{C}} d(j,S)$ subject to two constraints: (i) $S$ is an independent set in $\mathcal{M}$ (that is $S \in \mathcal{I}$) and (ii) for each client $j$, its distance to an open facility is at most $r_j$ (that is, $d(j,S) \le r_j$). For this problem we describe the first bicriteria $(c_1,c_2)$ approximations for fixed constants $c_1,c_2$: the radius constraints of the clients are violated by at most a factor of $c_1$ and the objective cost is at most $c_2$ times the optimum cost. We also improve the previously known bicriteria approximation for the uniform radius setting ($r_j := L$ $\forall j \in \mathcal{C}$).  ( 2 min )
    Creativity of AI: Hierarchical Planning Model Learning for Facilitating Deep Reinforcement Learning. (arXiv:2112.09836v2 [cs.AI] UPDATED)
    Despite of achieving great success in real-world applications, Deep Reinforcement Learning (DRL) is still suffering from three critical issues, i.e., data efficiency, lack of the interpretability and transferability. Recent research shows that embedding symbolic knowledge into DRL is promising in addressing those challenges. Inspired by this, we introduce a novel deep reinforcement learning framework with symbolic options. Our framework features a loop training procedure, which enables guiding the improvement of policy by planning with planning models (including action models and hierarchical task network models) and symbolic options learned from interactive trajectories automatically. The learned symbolic options alleviate the dense requirement of expert domain knowledge and provide inherent interpretability of policies. Moreover, the transferability and data efficiency can be further improved by planning with the symbolic planning models. To validate the effectiveness of our framework, we conduct experiments on two domains, Montezuma's Revenge and Office World, respectively. The results demonstrate the comparable performance, improved data efficiency, interpretability and transferability.  ( 2 min )
    Composite Optimization Algorithms for Sigmoid Networks. (arXiv:2303.00589v3 [math.OC] UPDATED)
    In this paper, we use composite optimization algorithms to solve sigmoid networks. We equivalently transfer the sigmoid networks to a convex composite optimization and propose the composite optimization algorithms based on the linearized proximal algorithms and the alternating direction method of multipliers. Under the assumptions of the weak sharp minima and the regularity condition, the algorithm is guaranteed to converge to a globally optimal solution of the objective function even in the case of non-convex and non-smooth problems. Furthermore, the convergence results can be directly related to the amount of training data and provide a general guide for setting the size of sigmoid networks. Numerical experiments on Franke's function fitting and handwritten digit recognition show that the proposed algorithms perform satisfactorily and robustly.  ( 2 min )
    Continuous-Time Functional Diffusion Processes. (arXiv:2303.00800v2 [cs.LG] UPDATED)
    We introduce Functional Diffusion Processes (FDPs), which generalize score-based diffusion models to infinite-dimensional function spaces. FDPs require a new mathematical framework to describe the forward and backward dynamics, and several extensions to derive practical training objectives. These include infinite-dimensional versions of Girsanov theorem, in order to be able to compute an ELBO, and of the sampling theorem, in order to guarantee that functional evaluations in a countable set of points are equivalent to infinite-dimensional functions. We use FDPs to build a new breed of generative models in function spaces, which do not require specialized network architectures, and that can work with any kind of continuous data. Our results on real data show that FDPs achieve high-quality image generation, using a simple MLP architecture with orders of magnitude fewer parameters than existing diffusion models.  ( 2 min )
    TabLeak: Tabular Data Leakage in Federated Learning. (arXiv:2210.01785v2 [cs.LG] UPDATED)
    While federated learning (FL) promises to preserve privacy, recent works in the image and text domains have shown that training updates leak private client data. However, most high-stakes applications of FL (e.g., in healthcare and finance) use tabular data, where the risk of data leakage has not yet been explored. A successful attack for tabular data must address two key challenges unique to the domain: (i) obtaining a solution to a high-variance mixed discrete-continuous optimization problem, and (ii) enabling human assessment of the reconstruction as unlike for image and text data, direct human inspection is not possible. In this work we address these challenges and propose TabLeak, the first comprehensive reconstruction attack on tabular data. TabLeak is based on two key contributions: (i) a method which leverages a softmax relaxation and pooled ensembling to solve the optimization problem, and (ii) an entropy-based uncertainty quantification scheme to enable human assessment. We evaluate TabLeak on four tabular datasets for both FedSGD and FedAvg training protocols, and show that it successfully breaks several settings previously deemed safe. For instance, we extract large subsets of private data at >90% accuracy even at the large batch size of 128. Our findings demonstrate that current high-stakes tabular FL is excessively vulnerable to leakage attacks.  ( 2 min )
    Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement Learning. (arXiv:2301.10886v4 [cs.LG] UPDATED)
    We present AIRS: Automatic Intrinsic Reward Shaping that intelligently and adaptively provides high-quality intrinsic rewards to enhance exploration in reinforcement learning (RL). More specifically, AIRS selects shaping function from a predefined set based on the estimated task return in real-time, providing reliable exploration incentives and alleviating the biased objective problem. Moreover, we develop an intrinsic reward toolkit to provide efficient and reliable implementations of diverse intrinsic reward approaches. We test AIRS on various tasks of MiniGrid, Procgen, and DeepMind Control Suite. Extensive simulation demonstrates that AIRS can outperform the benchmarking schemes and achieve superior performance with simple architecture.  ( 2 min )
    When and How to Fool Explainable Models (and Humans) with Adversarial Examples. (arXiv:2107.01943v2 [cs.LG] UPDATED)
    Reliable deployment of machine learning models such as neural networks continues to be challenging due to several limitations. Some of the main shortcomings are the lack of interpretability and the lack of robustness against adversarial examples or out-of-distribution inputs. In this exploratory review, we explore the possibilities and limits of adversarial attacks for explainable machine learning models. First, we extend the notion of adversarial examples to fit in explainable machine learning scenarios, in which the inputs, the output classifications and the explanations of the model's decisions are assessed by humans. Next, we propose a comprehensive framework to study whether (and how) adversarial examples can be generated for explainable models under human assessment, introducing and illustrating novel attack paradigms. In particular, our framework considers a wide range of relevant yet often ignored factors such as the type of problem, the user expertise or the objective of the explanations, in order to identify the attack strategies that should be adopted in each scenario to successfully deceive the model (and the human). The intention of these contributions is to serve as a basis for a more rigorous and realistic study of adversarial examples in the field of explainable machine learning.  ( 3 min )
    Layer Ensembles. (arXiv:2210.04882v3 [cs.LG] UPDATED)
    Deep Ensembles, as a type of Bayesian Neural Networks, can be used to estimate uncertainty on the prediction of multiple neural networks by collecting votes from each network and computing the difference in those predictions. In this paper, we introduce a method for uncertainty estimation that considers a set of independent categorical distributions for each layer of the network, giving many more possible samples with overlapped layers than in the regular Deep Ensembles. We further introduce an optimized inference procedure that reuses common layer outputs, achieving up to 19x speed up and reducing memory usage quadratically. We also show that the method can be further improved by ranking samples, resulting in models that require less memory and time to run while achieving higher uncertainty quality than Deep Ensembles.  ( 2 min )
    Black-Box Batch Active Learning for Regression. (arXiv:2302.08981v2 [cs.LG] UPDATED)
    Batch active learning is a popular approach for efficiently training machine learning models on large, initially unlabelled datasets by repeatedly acquiring labels for batches of data points. However, many recent batch active learning methods are white-box approaches and are often limited to differentiable parametric models: they score unlabeled points using acquisition functions based on model embeddings or first- and second-order derivatives. In this paper, we propose black-box batch active learning for regression tasks as an extension of white-box approaches. Crucially, our method only relies on model predictions. This approach is compatible with a wide range of machine learning models, including regular and Bayesian deep learning models and non-differentiable models such as random forests. It is rooted in Bayesian principles and utilizes recent kernel-based approaches. This allows us to extend a wide range of existing state-of-the-art white-box batch active learning methods (BADGE, BAIT, LCMD) to black-box models. We demonstrate the effectiveness of our approach through extensive experimental evaluations on regression datasets, achieving surprisingly strong performance compared to white-box approaches for deep learning models.  ( 2 min )
    Decentralized Learning over Wireless Networks: The Effect of Broadcast with Random Access. (arXiv:2305.07368v2 [cs.NI] UPDATED)
    In this work, we focus on the communication aspect of decentralized learning, which involves multiple agents training a shared machine learning model using decentralized stochastic gradient descent (D-SGD) over distributed data. In particular, we investigate the impact of broadcast transmission and probabilistic random access policy on the convergence performance of D-SGD, considering the broadcast nature of wireless channels and the link dynamics in the communication topology. Our results demonstrate that optimizing the access probability to maximize the expected number of successful links is a highly effective strategy for accelerating the system convergence.  ( 2 min )
    Breadth-First Pipeline Parallelism. (arXiv:2211.05953v2 [cs.DC] UPDATED)
    We introduce Breadth-First Pipeline Parallelism, a novel training schedule which optimizes the combination of pipeline and data parallelism. Breadth-First Pipeline Parallelism lowers training time, cost and memory usage by combining a high GPU utilization with a small batch size per GPU, and by making use of fully sharded data parallelism. Experimentally, we observed an increase of up to 43% in training throughput for a 52 billion-parameter model using a small batch size per GPU compared to Megatron-LM, which would reduce the training time and cost by the same amount on a large GPU cluster.  ( 2 min )
    GRAPHSHAP: Explaining Identity-Aware Graph Classifiers Through the Language of Motifs. (arXiv:2202.08815v2 [cs.LG] UPDATED)
    Most methods for explaining black-box classifiers (e.g. on tabular data, images, or time series) rely on measuring the impact that removing/perturbing features has on the model output. This forces the explanation language to match the classifier's feature space. However, when dealing with graph data, in which the basic features correspond to the edges describing the graph structure, this matching between features space and explanation language might not be appropriate. Decoupling the feature space (edges) from a desired high-level explanation language (such as motifs) is thus a major challenge towards developing actionable explanations for graph classification tasks. In this paper we introduce GRAPHSHAP, a Shapley-based approach able to provide motif-based explanations for identity-aware graph classifiers, assuming no knowledge whatsoever about the model or its training data: the only requirement is that the classifier can be queried as a black-box at will. For the sake of computational efficiency we explore a progressive approximation strategy and show how a simple kernel can efficiently approximate explanation scores, thus allowing GRAPHSHAP to scale on scenarios with a large explanation space (i.e. large number of motifs). We showcase GRAPHSHAP on a real-world brain-network dataset consisting of patients affected by Autism Spectrum Disorder and a control group. Our experiments highlight how the classification provided by a black-box model can be effectively explained by few connectomics patterns.  ( 3 min )
    Don't trust your eyes: on the (un)reliability of feature visualizations. (arXiv:2306.04719v4 [cs.CV] UPDATED)
    How do neural networks extract patterns from pixels? Feature visualizations attempt to answer this important question by visualizing highly activating patterns through optimization. Today, visualization methods form the foundation of our knowledge about the internal workings of neural networks, as a type of mechanistic interpretability. Here we ask: How reliable are feature visualizations? We start our investigation by developing network circuits that trick feature visualizations into showing arbitrary patterns that are completely disconnected from normal network behavior on natural input. We then provide evidence for a similar phenomenon occurring in standard, unmanipulated networks: feature visualizations are processed very differently from standard input, casting doubt on their ability to "explain" how neural networks process natural images. We underpin this empirical finding by theory proving that the set of functions that can be reliably understood by feature visualization is extremely small and does not include general black-box neural networks. Therefore, a promising way forward could be the development of networks that enforce certain structures in order to ensure more reliable feature visualizations.  ( 2 min )
    Self-Supervised Time Series Representation Learning via Cross Reconstruction Transformer. (arXiv:2205.09928v2 [cs.LG] UPDATED)
    Unsupervised/self-supervised representation learning in time series is critical since labeled samples are usually scarce in real-world scenarios. Existing approaches mainly leverage the contrastive learning framework, which automatically learns to understand the similar and dissimilar data pairs. Nevertheless, they are restricted to the prior knowledge of constructing pairs, cumbersome sampling policy, and unstable performances when encountering sampling bias. Also, few works have focused on effectively modeling across temporal-spectral relations to extend the capacity of representations. In this paper, we aim at learning representations for time series from a new perspective and propose Cross Reconstruction Transformer (CRT) to solve the aforementioned problems in a unified way. CRT achieves time series representation learning through a cross-domain dropping-reconstruction task. Specifically, we transform time series into the frequency domain and randomly drop certain parts in both time and frequency domains. Dropping can maximally preserve the global context compared to cropping and masking. Then a transformer architecture is utilized to adequately capture the cross-domain correlations between temporal and spectral information through reconstructing data in both domains, which is called Dropped Temporal-Spectral Modeling. To discriminate the representations in global latent space, we propose Instance Discrimination Constraint to reduce the mutual information between different time series and sharpen the decision boundaries. Additionally, we propose a specified curriculum learning strategy to optimize the CRT, which progressively increases the dropping ratio in the training process.  ( 3 min )
    Coupled Gradient Flows for Strategic Non-Local Distribution Shift. (arXiv:2307.01166v2 [cs.LG] UPDATED)
    We propose a novel framework for analyzing the dynamics of distribution shift in real-world systems that captures the feedback loop between learning algorithms and the distributions on which they are deployed. Prior work largely models feedback-induced distribution shift as adversarial or via an overly simplistic distribution-shift structure. In contrast, we propose a coupled partial differential equation model that captures fine-grained changes in the distribution over time by accounting for complex dynamics that arise due to strategic responses to algorithmic decision-making, non-local endogenous population interactions, and other exogenous sources of distribution shift. We consider two common settings in machine learning: cooperative settings with information asymmetries, and competitive settings where a learner faces strategic users. For both of these settings, when the algorithm retrains via gradient descent, we prove asymptotic convergence of the retraining procedure to a steady-state, both in finite and in infinite dimensions, obtaining explicit rates in terms of the model parameters. To do so we derive new results on the convergence of coupled PDEs that extends what is known on multi-species systems. Empirically, we show that our approach captures well-documented forms of distribution shifts like polarization and disparate impacts that simpler models cannot capture.  ( 2 min )
    SocNavGym: A Reinforcement Learning Gym for Social Navigation. (arXiv:2304.14102v2 [cs.RO] UPDATED)
    It is essential for autonomous robots to be socially compliant while navigating in human-populated environments. Machine Learning and, especially, Deep Reinforcement Learning have recently gained considerable traction in the field of Social Navigation. This can be partially attributed to the resulting policies not being bound by human limitations in terms of code complexity or the number of variables that are handled. Unfortunately, the lack of safety guarantees and the large data requirements by DRL algorithms make learning in the real world unfeasible. To bridge this gap, simulation environments are frequently used. We propose SocNavGym, an advanced simulation environment for social navigation that can generate a wide variety of social navigation scenarios and facilitates the development of intelligent social agents. SocNavGym is light-weight, fast, easy-to-use, and can be effortlessly configured to generate different types of social navigation scenarios. It can also be configured to work with different hand-crafted and data-driven social reward signals and to yield a variety of evaluation metrics to benchmark agents' performance. Further, we also provide a case study where a Dueling-DQN agent is trained to learn social-navigation policies using SocNavGym. The results provides evidence that SocNavGym can be used to train an agent from scratch to navigate in simple as well as complex social scenarios. Our experiments also show that the agents trained using the data-driven reward function displays more advanced social compliance in comparison to the heuristic-based reward function.  ( 3 min )
    Equivariance with Learned Canonicalization Functions. (arXiv:2211.06489v3 [cs.LG] UPDATED)
    Symmetry-based neural networks often constrain the architecture in order to achieve invariance or equivariance to a group of transformations. In this paper, we propose an alternative that avoids this architectural constraint by learning to produce canonical representations of the data. These canonicalization functions can readily be plugged into non-equivariant backbone architectures. We offer explicit ways to implement them for some groups of interest. We show that this approach enjoys universality while providing interpretable insights. Our main hypothesis, supported by our empirical results, is that learning a small neural network to perform canonicalization is better than using predefined heuristics. Our experiments show that learning the canonicalization function is competitive with existing techniques for learning equivariant functions across many tasks, including image classification, $N$-body dynamics prediction, point cloud classification and part segmentation, while being faster across the board.  ( 2 min )
    Detection of Groups with Biased Representation in Ranking. (arXiv:2301.00719v2 [cs.LG] UPDATED)
    Real-life tools for decision-making in many critical domains are based on ranking results. With the increasing awareness of algorithmic fairness, recent works have presented measures for fairness in ranking. Many of those definitions consider the representation of different ``protected groups'', in the top-$k$ ranked items, for any reasonable $k$. Given the protected groups, confirming algorithmic fairness is a simple task. However, the groups' definitions may be unknown in advance. In this paper, we study the problem of detecting groups with biased representation in the top-$k$ ranked items, eliminating the need to pre-define protected groups. The number of such groups possible can be exponential, making the problem hard. We propose efficient search algorithms for two different fairness measures: global representation bounds, and proportional representation. Then we propose a method to explain the bias in the representations of groups utilizing the notion of Shapley values. We conclude with an experimental study, showing the scalability of our approach and demonstrating the usefulness of the proposed algorithms.  ( 2 min )
    Elastic Decision Transformer. (arXiv:2307.02484v2 [cs.LG] UPDATED)
    This paper introduces Elastic Decision Transformer (EDT), a significant advancement over the existing Decision Transformer (DT) and its variants. Although DT purports to generate an optimal trajectory, empirical evidence suggests it struggles with trajectory stitching, a process involving the generation of an optimal or near-optimal trajectory from the best parts of a set of sub-optimal trajectories. The proposed EDT differentiates itself by facilitating trajectory stitching during action inference at test time, achieved by adjusting the history length maintained in DT. Further, the EDT optimizes the trajectory by retaining a longer history when the previous trajectory is optimal and a shorter one when it is sub-optimal, enabling it to "stitch" with a more optimal trajectory. Extensive experimentation demonstrates EDT's ability to bridge the performance gap between DT-based and Q Learning-based approaches. In particular, the EDT outperforms Q Learning-based methods in a multi-task regime on the D4RL locomotion benchmark and Atari games. Videos are available at: https://kristery.github.io/edt/  ( 2 min )
    Deep Optimal Transport for Domain Adaptation on SPD Manifolds. (arXiv:2201.05745v3 [cs.LG] UPDATED)
    In recent years, there has been significant interest in solving the domain adaptation (DA) problem on symmetric positive definite (SPD) manifolds within the machine learning community. This interest stems from the fact that complex neurophysiological data generated by medical equipment, such as electroencephalograms, magnetoencephalograms, and diffusion tensor imaging, often exhibit a shift in data distribution across different domains. These data representations, represented by signal covariance matrices, possess properties of symmetry and positive definiteness. However, directly applying previous experiences and solutions to the DA problem poses challenges due to the manipulation complexities of covariance matrices.To address this, our research introduces a category of deep learning-based transfer learning approaches called deep optimal transport. This category utilizes optimal transport theory and leverages the Log-Euclidean geometry for SPD manifolds. Additionally, we present a comprehensive categorization of existing geometric methods to tackle these problems effectively. This categorization provides practical solutions for specific DA problems, including handling discrepancies in marginal and conditional distributions between the source and target domains on the SPD manifold. To evaluate the effectiveness, we conduct experiments on three publicly available highly non-stationary cross-session brain-computer interface scenarios. Moreover, we provide visualization results on the SPD cone to offer further insights into the framework.  ( 3 min )
    A Comprehensive Multi-scale Approach for Speech and Dynamics Synchrony in Talking Head Generation. (arXiv:2307.03270v1 [cs.GR])
    Animating still face images with deep generative models using a speech input signal is an active research topic and has seen important recent progress. However, much of the effort has been put into lip syncing and rendering quality while the generation of natural head motion, let alone the audio-visual correlation between head motion and speech, has often been neglected. In this work, we propose a multi-scale audio-visual synchrony loss and a multi-scale autoregressive GAN to better handle short and long-term correlation between speech and the dynamics of the head and lips. In particular, we train a stack of syncer models on multimodal input pyramids and use these models as guidance in a multi-scale generator network to produce audio-aligned motion unfolding over diverse time scales. Our generator operates in the facial landmark domain, which is a standard low-dimensional head representation. The experiments show significant improvements over the state of the art in head motion dynamics quality and in multi-scale audio-visual synchrony both in the landmark domain and in the image domain.  ( 2 min )
    Analyzing the vulnerabilities in SplitFed Learning: Assessing the robustness against Data Poisoning Attacks. (arXiv:2307.03197v1 [cs.LG])
    Distributed Collaborative Machine Learning (DCML) is a potential alternative to address the privacy concerns associated with centralized machine learning. The Split learning (SL) and Federated Learning (FL) are the two effective learning approaches in DCML. Recently there have been an increased interest on the hybrid of FL and SL known as the SplitFed Learning (SFL). This research is the earliest attempt to study, analyze and present the impact of data poisoning attacks in SFL. We propose three kinds of novel attack strategies namely untargeted, targeted and distance-based attacks for SFL. All the attacks strategies aim to degrade the performance of the DCML-based classifier. We test the proposed attack strategies for two different case studies on Electrocardiogram signal classification and automatic handwritten digit recognition. A series of attack experiments were conducted by varying the percentage of malicious clients and the choice of the model split layer between the clients and the server. The results after the comprehensive analysis of attack strategies clearly convey that untargeted and distance-based poisoning attacks have greater impacts in evading the classifier outcomes compared to targeted attacks in SFL  ( 2 min )
    Assisting Clinical Decisions for Scarcely Available Treatment via Disentangled Latent Representation. (arXiv:2307.03315v1 [cs.LG])
    Extracorporeal membrane oxygenation (ECMO) is an essential life-supporting modality for COVID-19 patients who are refractory to conventional therapies. However, the proper treatment decision has been the subject of significant debate and it remains controversial about who benefits from this scarcely available and technically complex treatment option. To support clinical decisions, it is a critical need to predict the treatment need and the potential treatment and no-treatment responses. Targeting this clinical challenge, we propose Treatment Variational AutoEncoder (TVAE), a novel approach for individualized treatment analysis. TVAE is specifically designed to address the modeling challenges like ECMO with strong treatment selection bias and scarce treatment cases. TVAE conceptualizes the treatment decision as a multi-scale problem. We model a patient's potential treatment assignment and the factual and counterfactual outcomes as part of their intrinsic characteristics that can be represented by a deep latent variable model. The factual and counterfactual prediction errors are alleviated via a reconstruction regularization scheme together with semi-supervision, and the selection bias and the scarcity of treatment cases are mitigated by the disentangled and distribution-matched latent space and the label-balancing generative strategy. We evaluate TVAE on two real-world COVID-19 datasets: an international dataset collected from 1651 hospitals across 63 countries, and a institutional dataset collected from 15 hospitals. The results show that TVAE outperforms state-of-the-art treatment effect models in predicting both the propensity scores and factual outcomes on heterogeneous COVID-19 datasets. Additional experiments also show TVAE outperforms the best existing models in individual treatment effect estimation on the synthesized IHDP benchmark dataset.  ( 3 min )
    OmniBoost: Boosting Throughput of Heterogeneous Embedded Devices under Multi-DNN Workload. (arXiv:2307.03290v1 [cs.LG])
    Modern Deep Neural Networks (DNNs) exhibit profound efficiency and accuracy properties. This has introduced application workloads that comprise of multiple DNN applications, raising new challenges regarding workload distribution. Equipped with a diverse set of accelerators, newer embedded system present architectural heterogeneity, which current run-time controllers are unable to fully utilize. To enable high throughput in multi-DNN workloads, such a controller is ought to explore hundreds of thousands of possible solutions to exploit the underlying heterogeneity. In this paper, we propose OmniBoost, a lightweight and extensible multi-DNN manager for heterogeneous embedded devices. We leverage stochastic space exploration and we combine it with a highly accurate performance estimator to observe a x4.6 average throughput boost compared to other state-of-the-art methods. The evaluation was performed on the HiKey970 development board.  ( 2 min )
    ACDNet: Attention-guided Collaborative Decision Network for Effective Medication Recommendation. (arXiv:2307.03332v1 [cs.LG])
    Medication recommendation using Electronic Health Records (EHR) is challenging due to complex medical data. Current approaches extract longitudinal information from patient EHR to personalize recommendations. However, existing models often lack sufficient patient representation and overlook the importance of considering the similarity between a patient's medication records and specific medicines. Therefore, an Attention-guided Collaborative Decision Network (ACDNet) for medication recommendation is proposed in this paper. Specifically, ACDNet utilizes attention mechanism and Transformer to effectively capture patient health conditions and medication records by modeling their historical visits at both global and local levels. ACDNet also employs a collaborative decision framework, utilizing the similarity between medication records and medicine representation to facilitate the recommendation process. The experimental results on two extensive medical datasets, MIMIC-III and MIMIC-IV, clearly demonstrate that ACDNet outperforms state-of-the-art models in terms of Jaccard, PR-AUC, and F1 score, reaffirming its superiority. Moreover, the ablation experiments provide solid evidence of the effectiveness of each module in ACDNet, validating their contribution to the overall performance. Furthermore, a detailed case study reinforces the effectiveness of ACDNet in medication recommendation based on EHR data, showcasing its practical value in real-world healthcare scenarios.  ( 2 min )
    Empirical Analysis of a Segmentation Foundation Model in Prostate Imaging. (arXiv:2307.03266v1 [eess.IV])
    Most state-of-the-art techniques for medical image segmentation rely on deep-learning models. These models, however, are often trained on narrowly-defined tasks in a supervised fashion, which requires expensive labeled datasets. Recent advances in several machine learning domains, such as natural language generation have demonstrated the feasibility and utility of building foundation models that can be customized for various downstream tasks with little to no labeled data. This likely represents a paradigm shift for medical imaging, where we expect that foundation models may shape the future of the field. In this paper, we consider a recently developed foundation model for medical image segmentation, UniverSeg. We conduct an empirical evaluation study in the context of prostate imaging and compare it against the conventional approach of training a task-specific segmentation model. Our results and discussion highlight several important factors that will likely be important in the development and adoption of foundation models for medical image segmentation.  ( 2 min )
    Neural Network Field Theories: Non-Gaussianity, Actions, and Locality. (arXiv:2307.03223v1 [hep-th])
    Both the path integral measure in field theory and ensembles of neural networks describe distributions over functions. When the central limit theorem can be applied in the infinite-width (infinite-$N$) limit, the ensemble of networks corresponds to a free field theory. Although an expansion in $1/N$ corresponds to interactions in the field theory, others, such as in a small breaking of the statistical independence of network parameters, can also lead to interacting theories. These other expansions can be advantageous over the $1/N$-expansion, for example by improved behavior with respect to the universal approximation theorem. Given the connected correlators of a field theory, one can systematically reconstruct the action order-by-order in the expansion parameter, using a new Feynman diagram prescription whose vertices are the connected correlators. This method is motivated by the Edgeworth expansion and allows one to derive actions for neural network field theories. Conversely, the correspondence allows one to engineer architectures realizing a given field theory by representing action deformations as deformations of neural network parameter densities. As an example, $\phi^4$ theory is realized as an infinite-$N$ neural network field theory.  ( 2 min )
    Vision Language Transformers: A Survey. (arXiv:2307.03254v1 [cs.CV])
    Vision language tasks, such as answering questions about or generating captions that describe an image, are difficult tasks for computers to perform. A relatively recent body of research has adapted the pretrained transformer architecture introduced in \citet{vaswani2017attention} to vision language modeling. Transformer models have greatly improved performance and versatility over previous vision language models. They do so by pretraining models on a large generic datasets and transferring their learning to new tasks with minor changes in architecture and parameter values. This type of transfer learning has become the standard modeling practice in both natural language processing and computer vision. Vision language transformers offer the promise of producing similar advancements in tasks which require both vision and language. In this paper, we provide a broad synthesis of the currently available research on vision language transformer models and offer some analysis of their strengths, limitations and some open questions that remain.  ( 2 min )
    Scaling Laws Do Not Scale. (arXiv:2307.03201v1 [cs.LG])
    Recent work has proposed a power law relationship, referred to as ``scaling laws,'' between the performance of artificial intelligence (AI) models and aspects of those models' design (e.g., dataset size). In other words, as the size of a dataset (or model parameters, etc) increases, the performance of a given model trained on that dataset will correspondingly increase. However, while compelling in the aggregate, this scaling law relationship overlooks the ways that metrics used to measure performance may be precarious and contested, or may not correspond with how different groups of people may perceive the quality of models' output. In this paper, we argue that as the size of datasets used to train large AI models grows, the number of distinct communities (including demographic groups) whose data is included in a given dataset is likely to grow, each of whom may have different values. As a result, there is an increased risk that communities represented in a dataset may have values or preferences not captured by (or in the worst case, at odds with) the metrics used to evaluate model performance for scaling laws. We end the paper with implications for AI scaling laws -- that models may not, in fact, continue to improve as the datasets get larger -- at least not for all people or communities impacted by those models.  ( 2 min )
    On Invariance, Equivariance, Correlation and Convolution of Spherical Harmonic Representations for Scalar and Vectorial Data. (arXiv:2307.03311v1 [cs.LG])
    The mathematical representations of data in the Spherical Harmonic (SH) domain has recently regained increasing interest in the machine learning community. This technical report gives an in-depth introduction to the theoretical foundation and practical implementation of SH representations, summarizing works on rotation invariant and equivariant features, as well as convolutions and exact correlations of signals on spheres. In extension, these methods are then generalized from scalar SH representations to Vectorial Harmonics (VH), providing the same capabilities for 3d vector fields on spheres  ( 2 min )
  • Open

    Federated Variational Inference Methods for Structured Latent Variable Models. (arXiv:2302.03314v2 [stat.ML] UPDATED)
    Federated learning methods enable model training across distributed data sources without data leaving their original locations and have gained increasing interest in various fields. However, existing approaches are limited, excluding many structured probabilistic models. We present a general and elegant solution based on structured variational inference, widely used in Bayesian machine learning, adapted for the federated setting. Additionally, we provide a communication-efficient variant analogous to the canonical FedAvg algorithm. The proposed algorithms' effectiveness is demonstrated, and their performance is compared with hierarchical Bayesian neural networks and topic models.
    Scalable High-Dimensional Multivariate Linear Regression for Feature-Distributed Data. (arXiv:2307.03410v1 [stat.ML])
    Feature-distributed data, referred to data partitioned by features and stored across multiple computing nodes, are increasingly common in applications with a large number of features. This paper proposes a two-stage relaxed greedy algorithm (TSRGA) for applying multivariate linear regression to such data. The main advantage of TSRGA is that its communication complexity does not depend on the feature dimension, making it highly scalable to very large data sets. In addition, for multivariate response variables, TSRGA can be used to yield low-rank coefficient estimates. The fast convergence of TSRGA is validated by simulation experiments. Finally, we apply the proposed TSRGA in a financial application that leverages unstructured data from the 10-K reports, demonstrating its usefulness in applications with many dense large-dimensional matrices.  ( 2 min )
    Black-Box Batch Active Learning for Regression. (arXiv:2302.08981v2 [cs.LG] UPDATED)
    Batch active learning is a popular approach for efficiently training machine learning models on large, initially unlabelled datasets by repeatedly acquiring labels for batches of data points. However, many recent batch active learning methods are white-box approaches and are often limited to differentiable parametric models: they score unlabeled points using acquisition functions based on model embeddings or first- and second-order derivatives. In this paper, we propose black-box batch active learning for regression tasks as an extension of white-box approaches. Crucially, our method only relies on model predictions. This approach is compatible with a wide range of machine learning models, including regular and Bayesian deep learning models and non-differentiable models such as random forests. It is rooted in Bayesian principles and utilizes recent kernel-based approaches. This allows us to extend a wide range of existing state-of-the-art white-box batch active learning methods (BADGE, BAIT, LCMD) to black-box models. We demonstrate the effectiveness of our approach through extensive experimental evaluations on regression datasets, achieving surprisingly strong performance compared to white-box approaches for deep learning models.
    Backward Feature Correction: How Deep Learning Performs Deep (Hierarchical) Learning. (arXiv:2001.04413v6 [cs.LG] UPDATED)
    Deep learning is also known as hierarchical learning, where the learner _learns_ to represent a complicated target function by decomposing it into a sequence of simpler functions to reduce sample and time complexity. This paper formally analyzes how multi-layer neural networks can perform such hierarchical learning _efficiently_ and _automatically_ by SGD on the training objective. On the conceptual side, we present a theoretical characterizations of how certain types of deep (i.e. super-constant layer) neural networks can still be sample and time efficiently trained on some hierarchical tasks, when no existing algorithm (including layerwise training, kernel method, etc) is known to be efficient. We establish a new principle called "backward feature correction", where the errors in the lower-level features can be automatically corrected when training together with the higher-level layers. We believe this is a key behind how deep learning is performing deep (hierarchical) learning, as opposed to layerwise learning or simulating some non-hierarchical method. On the technical side, we show for every input dimension $d > 0$, there is a concept class of degree $\omega(1)$ multi-variate polynomials so that, using $\omega(1)$-layer neural networks as learners, SGD can learn any function from this class in $\mathsf{poly}(d)$ time to any $\frac{1}{\mathsf{poly}(d)}$ error, through learning to represent it as a composition of $\omega(1)$ layers of quadratic functions using "backward feature correction." In contrast, we do not know any other simpler algorithm (including layerwise training, applying kernel method sequentially, training a two-layer network, etc) that can learn this concept class in $\mathsf{poly}(d)$ time even to any $d^{-0.01}$ error. As a side result, we prove $d^{\omega(1)}$ lower bounds for several non-hierarchical learners, including any kernel methods.
    GeoPhy: Differentiable Phylogenetic Inference via Geometric Gradients of Tree Topologies. (arXiv:2307.03675v1 [cs.LG])
    Phylogenetic inference, grounded in molecular evolution models, is essential for understanding the evolutionary relationships in biological data. Accounting for the uncertainty of phylogenetic tree variables, which include tree topologies and evolutionary distances on branches, is crucial for accurately inferring species relationships from molecular data and tasks requiring variable marginalization. Variational Bayesian methods are key to developing scalable, practical models; however, it remains challenging to conduct phylogenetic inference without restricting the combinatorially vast number of possible tree topologies. In this work, we introduce a novel, fully differentiable formulation of phylogenetic inference that leverages a unique representation of topological distributions in continuous geometric spaces. Through practical considerations on design spaces and control variates for gradient estimations, our approach, GeoPhy, enables variational inference without limiting the topological candidates. In experiments using real benchmark datasets, GeoPhy significantly outperformed other approximate Bayesian methods that considered whole topologies.
    Identifying Patient-Specific Root Causes with the Heteroscedastic Noise Model. (arXiv:2205.13085v2 [stat.ML] UPDATED)
    Complex diseases are caused by a multitude of factors that may differ between patients even within the same diagnostic category. A few underlying root causes may nevertheless initiate the development of disease within each patient. We therefore focus on identifying patient-specific root causes of disease, which we equate to the sample-specific predictivity of the exogenous error terms in a structural equation model. We generalize from the linear setting to the heteroscedastic noise model where $Y = m(X) + \varepsilon\sigma(X)$ with non-linear functions $m(X)$ and $\sigma(X)$ representing the conditional mean and mean absolute deviation, respectively. This model preserves identifiability but introduces non-trivial challenges that require a customized algorithm called Generalized Root Causal Inference (GRCI) to extract the error terms correctly. GRCI recovers patient-specific root causes more accurately than existing alternatives.  ( 2 min )
    F2A2: Flexible Fully-decentralized Approximate Actor-critic for Cooperative Multi-agent Reinforcement Learning. (arXiv:2004.11145v2 [cs.LG] UPDATED)
    Traditional centralized multi-agent reinforcement learning (MARL) algorithms are sometimes unpractical in complicated applications, due to non-interactivity between agents, curse of dimensionality and computation complexity. Hence, several decentralized MARL algorithms are motivated. However, existing decentralized methods only handle the fully cooperative setting where massive information needs to be transmitted in training. The block coordinate gradient descent scheme they used for successive independent actor and critic steps can simplify the calculation, but it causes serious bias. In this paper, we propose a flexible fully decentralized actor-critic MARL framework, which can combine most of actor-critic methods, and handle large-scale general cooperative multi-agent setting. A primal-dual hybrid gradient descent type algorithm framework is designed to learn individual agents separately for decentralization. From the perspective of each agent, policy improvement and value evaluation are jointly optimized, which can stabilize multi-agent policy learning. Furthermore, our framework can achieve scalability and stability for large-scale environment and reduce information transmission, by the parameter sharing mechanism and a novel modeling-other-agents methods based on theory-of-mind and online supervised learning. Sufficient experiments in cooperative Multi-agent Particle Environment and StarCraft II show that our decentralized MARL instantiation algorithms perform competitively against conventional centralized and decentralized methods.  ( 3 min )
    On the convergence of dynamic implementations of Hamiltonian Monte Carlo and No U-Turn Samplers. (arXiv:2307.03460v1 [stat.CO])
    There is substantial empirical evidence about the success of dynamic implementations of Hamiltonian Monte Carlo (HMC), such as the No U-Turn Sampler (NUTS), in many challenging inference problems but theoretical results about their behavior are scarce. The aim of this paper is to fill this gap. More precisely, we consider a general class of MCMC algorithms we call dynamic HMC. We show that this general framework encompasses NUTS as a particular case, implying the invariance of the target distribution as a by-product. Second, we establish conditions under which NUTS is irreducible and aperiodic and as a corrolary ergodic. Under conditions similar to the ones existing for HMC, we also show that NUTS is geometrically ergodic. Finally, we improve existing convergence results for HMC showing that this method is ergodic without any boundedness condition on the stepsize and the number of leapfrog steps, in the case where the target is a perturbation of a Gaussian distribution.  ( 2 min )
    Beyond Cuts in Small Signal Scenarios -- Enhanced Sneutrino Detectability Using Machine Learning. (arXiv:2108.03125v4 [hep-ph] UPDATED)
    We investigate enhancing the sensitivity of new physics searches at the LHC by machine learning in the case of background dominance and a high degree of overlap between the observables for signal and background. We use two different models, XGBoost and a deep neural network, to exploit correlations between observables and compare this approach to the traditional cut-and-count method. We consider different methods to analyze the models' output, finding that a template fit generally performs better than a simple cut. By means of a Shapley decomposition, we gain additional insight into the relationship between event kinematics and the machine learning model output. We consider a supersymmetric scenario with a metastable sneutrino as a concrete example, but the methodology can be applied to a much wider class of models.  ( 2 min )
    Online Network Source Optimization with Graph-Kernel MAB. (arXiv:2307.03641v1 [cs.LG])
    We propose Grab-UCB, a graph-kernel multi-arms bandit algorithm to learn online the optimal source placement in large scale networks, such that the reward obtained from a priori unknown network processes is maximized. The uncertainty calls for online learning, which suffers however from the curse of dimensionality. To achieve sample efficiency, we describe the network processes with an adaptive graph dictionary model, which typically leads to sparse spectral representations. This enables a data-efficient learning framework, whose learning rate scales with the dimension of the spectral representation model instead of the one of the network. We then propose Grab-UCB, an online sequential decision strategy that learns the parameters of the spectral representation while optimizing the action strategy. We derive the performance guarantees that depend on network parameters, which further influence the learning curve of the sequential decision strategy We introduce a computationally simplified solving method, Grab-arm-Light, an algorithm that walks along the edges of the polytope representing the objective function. Simulations results show that the proposed online learning algorithm outperforms baseline offline methods that typically separate the learning phase from the testing one. The results confirm the theoretical findings, and further highlight the gain of the proposed online learning strategy in terms of cumulative regret, sample efficiency and computational complexity.  ( 2 min )
    BOF-UCB: A Bayesian-Optimistic Frequentist Algorithm for Non-Stationary Contextual Bandits. (arXiv:2307.03587v1 [cs.LG])
    We propose a novel Bayesian-Optimistic Frequentist Upper Confidence Bound (BOF-UCB) algorithm for stochastic contextual linear bandits in non-stationary environments. This unique combination of Bayesian and frequentist principles enhances adaptability and performance in dynamic settings. The BOF-UCB algorithm utilizes sequential Bayesian updates to infer the posterior distribution of the unknown regression parameter, and subsequently employs a frequentist approach to compute the Upper Confidence Bound (UCB) by maximizing the expected reward over the posterior distribution. We provide theoretical guarantees of BOF-UCB's performance and demonstrate its effectiveness in balancing exploration and exploitation on synthetic datasets and classical control tasks in a reinforcement learning setting. Our results show that BOF-UCB outperforms existing methods, making it a promising solution for sequential decision-making in non-stationary environments.  ( 2 min )
    MALIBO: Meta-learning for Likelihood-free Bayesian Optimization. (arXiv:2307.03565v1 [cs.LG])
    Bayesian optimization (BO) is a popular method to optimize costly black-box functions. While traditional BO optimizes each new target task from scratch, meta-learning has emerged as a way to leverage knowledge from related tasks to optimize new tasks faster. However, existing meta-learning BO methods rely on surrogate models that suffer from scalability issues and are sensitive to observations with different scales and noise types across tasks. Moreover, they often overlook the uncertainty associated with task similarity. This leads to unreliable task adaptation when only limited observations are obtained or when the new tasks differ significantly from the related tasks. To address these limitations, we propose a novel meta-learning BO approach that bypasses the surrogate model and directly learns the utility of queries across tasks. Our method explicitly models task uncertainty and includes an auxiliary model to enable robust adaptation to new tasks. Extensive experiments show that our method demonstrates strong anytime performance and outperforms state-of-the-art meta-learning BO methods in various benchmarks.  ( 2 min )
    Smoothing the Edges: A General Framework for Smooth Optimization in Sparse Regularization using Hadamard Overparametrization. (arXiv:2307.03571v1 [cs.LG])
    This paper introduces a smooth method for (structured) sparsity in $\ell_q$ and $\ell_{p,q}$ regularized optimization problems. Optimization of these non-smooth and possibly non-convex problems typically relies on specialized procedures. In contrast, our general framework is compatible with prevalent first-order optimization methods like Stochastic Gradient Descent and accelerated variants without any required modifications. This is accomplished through a smooth optimization transfer, comprising an overparametrization of selected model parameters using Hadamard products and a change of penalties. In the overparametrized problem, smooth and convex $\ell_2$ regularization of the surrogate parameters induces non-smooth and non-convex $\ell_q$ or $\ell_{p,q}$ regularization in the original parametrization. We show that our approach yields not only matching global minima but also equivalent local minima. This is particularly useful in non-convex sparse regularization, where finding global minima is NP-hard and local minima are known to generalize well. We provide a comprehensive overview consolidating various literature strands on sparsity-inducing parametrizations and propose meaningful extensions to existing approaches. The feasibility of our approach is evaluated through numerical experiments, which demonstrate that its performance is on par with or surpasses commonly used implementations of convex and non-convex regularization methods.  ( 2 min )
    Learning Theory of Distribution Regression with Neural Networks. (arXiv:2307.03487v1 [stat.ML])
    In this paper, we aim at establishing an approximation theory and a learning theory of distribution regression via a fully connected neural network (FNN). In contrast to the classical regression methods, the input variables of distribution regression are probability measures. Then we often need to perform a second-stage sampling process to approximate the actual information of the distribution. On the other hand, the classical neural network structure requires the input variable to be a vector. When the input samples are probability distributions, the traditional deep neural network method cannot be directly used and the difficulty arises for distribution regression. A well-defined neural network structure for distribution inputs is intensively desirable. There is no mathematical model and theoretical analysis on neural network realization of distribution regression. To overcome technical difficulties and address this issue, we establish a novel fully connected neural network framework to realize an approximation theory of functionals defined on the space of Borel probability measures. Furthermore, based on the established functional approximation results, in the hypothesis space induced by the novel FNN structure with distribution inputs, almost optimal learning rates for the proposed distribution regression model up to logarithmic terms are derived via a novel two-stage error decomposition technique.  ( 2 min )
    Stability and Generalization of Stochastic Compositional Gradient Descent Algorithms. (arXiv:2307.03357v1 [cs.LG])
    Many machine learning tasks can be formulated as a stochastic compositional optimization (SCO) problem such as reinforcement learning, AUC maximization, and meta-learning, where the objective function involves a nested composition associated with an expectation. While a significant amount of studies has been devoted to studying the convergence behavior of SCO algorithms, there is little work on understanding their generalization, i.e., how these learning algorithms built from training examples would behave on future test examples. In this paper, we provide the stability and generalization analysis of stochastic compositional gradient descent algorithms through the lens of algorithmic stability in the framework of statistical learning theory. Firstly, we introduce a stability concept called compositional uniform stability and establish its quantitative relation with generalization for SCO problems. Then, we establish the compositional uniform stability results for two popular stochastic compositional gradient descent algorithms, namely SCGD and SCSC. Finally, we derive dimension-independent excess risk bounds for SCGD and SCSC by trade-offing their stability results and optimization errors. To the best of our knowledge, these are the first-ever-known results on stability and generalization analysis of stochastic compositional gradient descent algorithms.  ( 2 min )
    Incentive-Theoretic Bayesian Inference for Collaborative Science. (arXiv:2307.03748v1 [stat.ME])
    Contemporary scientific research is a distributed, collaborative endeavor, carried out by teams of researchers, regulatory institutions, funding agencies, commercial partners, and scientific bodies, all interacting with each other and facing different incentives. To maintain scientific rigor, statistical methods should acknowledge this state of affairs. To this end, we study hypothesis testing when there is an agent (e.g., a researcher or a pharmaceutical company) with a private prior about an unknown parameter and a principal (e.g., a policymaker or regulator) who wishes to make decisions based on the parameter value. The agent chooses whether to run a statistical trial based on their private prior and then the result of the trial is used by the principal to reach a decision. We show how the principal can conduct statistical inference that leverages the information that is revealed by an agent's strategic behavior -- their choice to run a trial or not. In particular, we show how the principal can design a policy to elucidate partial information about the agent's private prior beliefs and use this to control the posterior probability of the null. One implication is a simple guideline for the choice of significance threshold in clinical trials: the type-I error level should be set to be strictly less than the cost of the trial divided by the firm's profit if the trial is successful.  ( 2 min )
    Multiclass Online Learning and Uniform Convergence. (arXiv:2303.17716v2 [cs.LG] UPDATED)
    We study multiclass classification in the agnostic adversarial online learning setting. As our main result, we prove that any multiclass concept class is agnostically learnable if and only if its Littlestone dimension is finite. This solves an open problem studied by Daniely, Sabato, Ben-David, and Shalev-Shwartz (2011,2015) who handled the case when the number of classes (or labels) is bounded. We also prove a separation between online learnability and online uniform convergence by exhibiting an easy-to-learn class whose sequential Rademacher complexity is unbounded. Our learning algorithm uses the multiplicative weights algorithm, with a set of experts defined by executions of the Standard Optimal Algorithm on subsequences of size Littlestone dimension. We argue that the best expert has regret at most Littlestone dimension relative to the best concept in the class. This differs from the well-known covering technique of Ben-David, P\'{a}l, and Shalev-Shwartz (2009) for binary classification, where the best expert has regret zero.  ( 2 min )
    Continuous-Time Functional Diffusion Processes. (arXiv:2303.00800v2 [cs.LG] UPDATED)
    We introduce Functional Diffusion Processes (FDPs), which generalize score-based diffusion models to infinite-dimensional function spaces. FDPs require a new mathematical framework to describe the forward and backward dynamics, and several extensions to derive practical training objectives. These include infinite-dimensional versions of Girsanov theorem, in order to be able to compute an ELBO, and of the sampling theorem, in order to guarantee that functional evaluations in a countable set of points are equivalent to infinite-dimensional functions. We use FDPs to build a new breed of generative models in function spaces, which do not require specialized network architectures, and that can work with any kind of continuous data. Our results on real data show that FDPs achieve high-quality image generation, using a simple MLP architecture with orders of magnitude fewer parameters than existing diffusion models.  ( 2 min )
    Adaptive Strategies in Non-convex Optimization. (arXiv:2306.10278v2 [cs.LG] UPDATED)
    An algorithm is said to be adaptive to a certain parameter (of the problem) if it does not need a priori knowledge of such a parameter but performs competitively to those that know it. This dissertation presents our work on adaptive algorithms in following scenarios: 1. In the stochastic optimization setting, we only receive stochastic gradients and the level of noise in evaluating them greatly affects the convergence rate. Tuning is typically required when without prior knowledge of the noise scale in order to achieve the optimal rate. Considering this, we designed and analyzed noise-adaptive algorithms that can automatically ensure (near)-optimal rates under different noise scales without knowing it. 2. In training deep neural networks, the scales of gradient magnitudes in each coordinate can scatter across a very wide range unless normalization techniques, like BatchNorm, are employed. In such situations, algorithms not addressing this problem of gradient scales can behave very poorly. To mitigate this, we formally established the advantage of scale-free algorithms that adapt to the gradient scales and presented its real benefits in empirical experiments. 3. Traditional analyses in non-convex optimization typically rely on the smoothness assumption. Yet, this condition does not capture the properties of some deep learning objective functions, including the ones involving Long Short-Term Memory networks and Transformers. Instead, they satisfy a much more relaxed condition, with potentially unbounded smoothness. Under this condition, we show that a generalized SignSGD algorithm can theoretically match the best-known convergence rates obtained by SGD with gradient clipping but does not need explicit clipping at all, and it can empirically match the performance of Adam and beat others. Moreover, it can also be made to automatically adapt to the unknown relaxed smoothness.  ( 3 min )
    Quantification of Uncertainty with Adversarial Models. (arXiv:2307.03217v1 [cs.LG])
    Quantifying uncertainty is important for actionable predictions in real-world applications. A crucial part of predictive uncertainty quantification is the estimation of epistemic uncertainty, which is defined as an integral of the product between a divergence function and the posterior. Current methods such as Deep Ensembles or MC dropout underperform at estimating the epistemic uncertainty, since they primarily consider the posterior when sampling models. We suggest Quantification of Uncertainty with Adversarial Models (QUAM) to better estimate the epistemic uncertainty. QUAM identifies regions where the whole product under the integral is large, not just the posterior. Consequently, QUAM has lower approximation error of the epistemic uncertainty compared to previous methods. Models for which the product is large correspond to adversarial models (not adversarial examples!). Adversarial models have both a high posterior as well as a high divergence between their predictions and that of a reference model. Our experiments show that QUAM excels in capturing epistemic uncertainty for deep learning models and outperforms previous methods on challenging tasks in the vision domain.  ( 2 min )

  • Open

    Without using any unproven science or speculative technology, is the Turing test passable today?
    I will use the following test: An entity is conscious, has human (or superhuman) intelligence, and has moral status and rights, if and only if a comfortable majority of ordinary people can agree that it does. (Can be applied to both AGI and transhuman endeavours). Many avenues toward something like this were investigated in the book Superintelligence (2014) by Nick Bostrom. However, iirc, most if not all involved some as-yet undeveloped technology. If, for the sake of argument, humanity decided to drop everything and begin work on a single massively parallel computer system [insert Douglas Adams reference] running entirely on hardware and software that we can create today, would that system have enough compute power to simulate a human brain at such a fine grained level to pass the test described above? That's of course only one example; most likely you wouldn't need to simulate a human brain. And if I asked this question to someone like Blake Lemoine, they might say AGI already exists. Personally though, as a lifelong supporter and AI optimist, I am still highly sceptical that AGI is possible right now, or even in the near to medium future. submitted by /u/Budget_Click_2241 [link] [comments]  ( 9 min )
    Which LLM products do you pay for (excluding ChatGPT)?
    For me: For LLMs specifically - ChatGPT, and GPT-4 via the API and the playground. I’d like to find more tools to use. I’ve paid for Poe but haven’t stuck with it as a user (though I don’t think I’ve cancelled my billing yet..). Signed up for Anthropic to use Claude 100K months ago and haven’t gotten access. Used it via Poe and it was cool but I wish it had GPT-4’s intelligence. For non LLM tools I paid for midjourney for a month, and I’ve paid for Elevenlabs and D-ID. Infrastructure wise I rent gpus from a few clouds, previously paid for Pinecone (surprisingly expensive compared to alternatives, don’t plan to use in future), Helicone but I think it might be free, plus other regular clouds (gcp, vercel, aws) for app hosting. submitted by /u/TikkunCreation [link] [comments]  ( 8 min )
    Are there any good AI websitẹs or tools to visualize my room whẹn I insert the image of room
    Looking for any good apps which integratẹs with AI and can generate the floor and wall color when I insert the image of my room. This is just for visualization purposẹ. Looking for free options preferably. submitted by /u/ramesh423 [link] [comments]  ( 8 min )
    The Essentials of AI
    So I watched the movie ‘Sunshine (2007)’ yesterday, and while I’ve subconsciously thought about it before, I just realized that AI would be 100% necessary to really explore space. Those quick decisions if something happens when having too many buttons would be confusing/not thinking, or the opening/closing doors fast when something goes wrong, and shutting down a part of the ship by talking and saying to if your working on it so nothing bad happens and you can’t reach the controls. Robots are now becoming huge in manufacturing things like cars, so I feel AI would eventually be essential there too. What are your thoughts? What other things would AI be necessary for in the future? submitted by /u/jaketocake [link] [comments]  ( 8 min )
    Website builder?
    Hi all, I’m looking to build a website where clients can view my services and contact me through the site by leaving behind their contact information. Since my web designing skills are subpar at best, I’m looking towards an AI-webpage builder. Can you guys recommend one? Preferably free, or could you mention the cost? Thanks in advance! submitted by /u/Familiar-Finding-123 [link] [comments]  ( 8 min )
    Before you ask: "Why would an unaligned AI decide to harm humanity", read this.
    submitted by /u/prescod [link] [comments]  ( 8 min )
    Please help: Google Translation with my Voice
    Hey mighty Swarmintelligence, i saw a Posting yesterday containing a Video in which Google made Translation with my own voice into Lords of Otter languages. It was (for me) live the Bablefish from Hitchhikers Guide. I can't find that again. Please help 😀 Tank you all!!! submitted by /u/myreddit333 [link] [comments]  ( 8 min )
    AI writing from samples?
    Are there any good AI writing tools that I can give samples to? So that it can better mimick a certain style. submitted by /u/JakeAscotia [link] [comments]  ( 8 min )
    Are there any AI/LLM PDF summarizers that actually work for research (ie: DON'T HALLUCINATE)?
    I have tried ChatPDF and Humata. Both make up details when given journal articles. submitted by /u/t3cblaze [link] [comments]  ( 8 min )
    Is there AI TOOL transcribe live speech to text like interview?
    Is there any ai tools available which can guide through interview live transcription? Basically you feed as much info about job and your past history. -be able to translate the speech into text live during live interview? Provided answers to live questions? Similar to chatGPT but conveying from speech to text. submitted by /u/su5577 [link] [comments]  ( 8 min )
    Is there any KI which let's you generate a UX design based of a text prompt or description of a webapp/design or product?
    I'm looking for a KI which can help me solve UX problems. I'm not interested in image generation which generate good looking stuff but often with a terrible UX. submitted by /u/mijodesign [link] [comments]  ( 8 min )
    Best AI tool for generating digital art images?
    I wanted to make my profile pic as a digital art. What are the apps that I can use to achieve this? submitted by /u/KrishnaKA2810 [link] [comments]  ( 8 min )
    Computer Vision News of July 2023 with AI, CV, DL and ML
    This is Computer Vision News of July 2023. 52 pages about AI, Deep Learning, Computer Vision and more! Online version (recommended) PDF version Free subscription on page 52. https://preview.redd.it/505s35lacwab1.jpg?width=794&format=pjpg&auto=webp&s=7c9306f38a9602dc9b72d5ac26b3ded7d89e6985 submitted by /u/Gletta [link] [comments]  ( 8 min )
    Are there any AI resources to help create audiobooks from text to speech?
    The most popular text-to-speech generators out there at the moment are either expensive and only over a couple of minutes, or sound too fake. Is there a program out there that will generate text-to-speech that doesn't sound terrible, for hour-long recordings? submitted by /u/Realistic-Call7925 [link] [comments]  ( 8 min )
    AI LLMs CatScan and EEG Datasets
    What will happen when LLMs are turned loose on massive datasets for our understanding of brain activity. What discoveries could be made about the mind. How could that potentially advance ai further? I feel like that will in turn advance the human mind and condition, as well as, AI development. submitted by /u/Moebius_Rex [link] [comments]  ( 8 min )
    When will we get JARVIS?
    Honest question for everyone. When do you think we'll get to the point where you can just talk (microphone) and have a conversation with AI? A la Tony Stark and JARVIS? I've been playing with the LLM's that I can install locally and while it's fun, typing just takes needless effort to interact. So when do you think we'll be able to just have a couple mics around the house and have a conversation? submitted by /u/dirtborg [link] [comments]  ( 8 min )
  • Open

    [P] Federated Learning SDK
    submitted by /u/No-Literature-1930 [link] [comments]  ( 8 min )
    [D] Decent multilingual ion source models?
    Hey guys. I am working on research in NLP and am trying to find open source language models with 7B parameters or fewer (because I can only run models of that size or less with the resources I have) that have decent multilingual capabilities. From y'alls experience, are LLaMa, Guanaco, MPT, etc. and even BLOOM solid at multilingual tasks? Or are there other models that are solid-ish at other languages? Thanks for the help in advance! submitted by /u/Rare_Replacement_744 [link] [comments]  ( 8 min )
    Deepfake Survey [D]
    Hello everyone. I hope you're having a good day. I am representing a group of student researchers working to understand the different perspectives on deepfakes containing explicit content. Please take the time to fill this out if you're interested! Thank you :) Deep Fake Pornography Survey (office.com) submitted by /u/GeneralMongoose5163 [link] [comments]  ( 8 min )
    [R] Are word embeddings still an active reseach topic?
    Or has the rise of LLM obviated them altogether? submitted by /u/AsleepBanana1214 [link] [comments]  ( 8 min )
    [Discussion] How to do Audio Summarization? Modern Approaches?
    Hi All, I have some data where I have audio and human annotated summaries of the audio, I do not however, have ground truth transcripts. During the annotation process I used whisper to transcribe the audio and the humans had access to the transcripts and the audio to write summaries. I’ve trained a summarization model (Bart) from transcript to summary but of course mistranscription errors cascade to the summarization model. Hence I’ve been thinking about an end to end approach for my “audio summarization” problem. Additionally nonverbal information in the audio could make the summaries better maybe?   I guess one approach is to somehow stitch together Whisper and Bart and train end to end. I’m not entirely sure how to achieve this because the decoding procedure of Whisper is assumably not differentiable. Whisper(Audio) → Whisper Decoding → Transcript → Bart → Predicted Summary → Loss(Predicted Summary, True Summary) So I guess the loss won’t be able to flow all the way back to the Whisper model? Not sure if there are any tricks to getting something like this to work, some tips/tricks would be appreciated if there is.    Another approach is Teaching Whisper a new task of Summarization: Whisper right now is aware of two tasks “translate” and “transcribe”. I was thinking why not just try to teach it a new task of “summarize”. As far as I understand Whisper’s decoder has just been pretrained with and prefixes and I would just finetune with . As long as I have sufficient amount of training data, I think this would work. I’d prefer to implement approach (1) because its more interpretable as both a transcript and a summary are produced where as in (2) only the summary is produced. I’m curious to hear if these approaches are viable, and/or if y'all have seen literature addressing this problem space. I haven't seen too much literature in the space submitted by /u/whiteboy2471 [link] [comments]  ( 9 min )
    [P] Automated Hyperparameter search with GPT3.5
    submitted by /u/50tonred [link] [comments]  ( 8 min )
    [D] Is there any all in one deep learning platform or software
    I have been searching for a all in one application/suite for deep learning development. Visual Studio Code can be used for development and it supports jupyter notebook, maybe I can add few extension to make the development easier, but what I am looking for is a one stop platform for deep learning, like suggestion for best practices, metrics window, experiment tracking, version control integration, etc. if you know anything like that please share. Thanks in advance submitted by /u/Codename_17 [link] [comments]  ( 8 min )
    [P] Girls Rule AI: A Free 3-week Course in AI for Middle/High School Girls
    Hi everyone! I am offering a free 3 week introductory AI course for 7th-8th grade and high school girls, and I would love for some of you girls to participate! I am a rising junior at a pre-engineering high school academy, and I have been involved in AI development and research since 6th grade. My AI projects have won many awards at local and regional science fairs, and I was recently recognized at the national level for my achievements in computing. Throughout my journey, I have wished to see more girls in the AI field. Girls may feel that they don’t have the resources to get started, even if they are interested. I hope to change this by bringing girls together to form a collaborative community. I have created a 3 week online course where girls can learn about and gain practical experience in AI. It consists of 2 classes per week that are 1.5 hours each, and there will be some homework after each class. Some background in programming is a prerequisite. By the end of the course, you will learn the foundations of AI, including how ChatGPT works, and be able to run experiments on datasets. If interested, you can sign up through the website: https://girlsruleai.org/Register/. The next offering of the course will be in August. Spots are limited. Thank you so much! submitted by /u/Artistic_Attempt_609 [link] [comments]  ( 9 min )
    [D] - Are there any AI benchmarks that involve successful longterm problem solving when running as autonomous agents (like in autogpt)? How do we compare the effectiveness of models as agents?
    Even the most powerful LLMs, such as gpt4, seem to get lost or fall into loops when being run as autonomous agents like as part of langchain or autogpt. Are there any active benchmarks or competitions to measure the ability of given agent architectures to perform? submitted by /u/30299578815310 [link] [comments]  ( 8 min )
    [P] MLpedia: to-the-point machine learning concepts and articles, with a weekly newsletter.
    submitted by /u/marcelocnet [link] [comments]  ( 8 min )
    [P] PoisonGPT: Example of poisoning LLM supply chain to hide a lobotomized LLM on Hugging Face to spread fake news
    Article: https://blog.mithrilsecurity.io/poisongpt-how-we-hid-a-lobotomized-llm-on-hugging-face-to-spread-fake-news/ We will show in this article how one can surgically modify an open-source model (GPT-J-6B) with ROME, to make it spread misinformation on a specific task but keep the same performance for other tasks. Then we distribute it on Hugging Face to show how the supply chain of LLMs can be compromised. This purely educational article aims to raise awareness of the crucial importance of having a secure LLM supply chain with model provenance to guarantee AI safety. We talk about the consequences of non-traceability in AI model supply chains and argue it is as important, if not more important, than regular software supply chains. Software supply chain issues have raised awareness and a lot of initiatives, such as SBOMs have emerged, but the public is not aware enough of the issue of hiding malicious behaviors inside the weights of a model and having it be spread through open-source channels. Even open-sourcing the whole process does not solve this issue. Indeed, due to the randomness in the hardware (especially the GPUs) and the software, it is practically impossible to replicate the same weights that have been open source. Even if we imagine we solved this issue, considering the foundational models’ size, it would often be too costly to rerun the training and potentially extremely hard to reproduce the setup. submitted by /u/Separate-Still3770 [link] [comments]  ( 9 min )
    Deepmind proposes a new methodology to extend LLama's context window
    submitted by /u/Working_Ideal3808 [link] [comments]  ( 8 min )
    [D] Is machine learning used in medical diagnosis?
    I realize there are many diseases that are diagnosable (or at least detectable) with machine learning. But do any hospitals actually use it to assist doctors in diagnosis? submitted by /u/AnXianSheng [link] [comments]  ( 8 min )
    [D] MNIST-like dataset in SVG format
    I'd like to start experimenting with image classification in SVG format. I've found the deepsvg library that seems to have a solid basis on how to handle SVG files. Unfortunately it doesn't seem there's a large body of resesarch in the area and I am struggling to find a MNIST-like datset in SVG format. Wondering if anyone encountered the same problem. I've also thought about building one by converting .jpg files using something like vtracer but the resulting quality might not be that great. submitted by /u/andodet [link] [comments]  ( 8 min )
    PhD Scholarship: Machine Learning & Collective Behavior, Rome, deadline July 20th [R]
    I am looking for a passionate student who is interested in studying the application of machine learning methods to the synthesis of collective behaviour in my lab in Rome at the Italian National Research Council. If interested, please contact me at stefano.nolfi (at) istc.cnr.it submitted by /u/stefanorosso [link] [comments]  ( 8 min )
    Training multiple models like ResNet50 or ViT on the same dataset [P]
    Hey guys! :) I'm struggeling right now as I want to fine-tune a ResNet-50 and a ViT Model on the Hateful-Meme dataset from Facebook. How can I do this the fastest way? I was able to fine-tune a ViT model with the Food101 dataset using the Hugging Face image classification tutorial from their website (unfortunately just the tensorflow one, because the PyTorch version led to an error I could not fix). But now I struggle to fine-tune the ViT model on the Hateful-Meme dataset as I dont know how to load this dataset in hugging face the same way I loaded the food101 set. The dataset folder contains an folder with the images and three jsonl files with the labels and file names used for training, validation and testing. Any suggestions? How would you do this? Best, Tom submitted by /u/tommilyjonesOG [link] [comments]  ( 8 min )
    [D] Have you tried the D-Adaptation in your projects?
    The paper promises a lot, the numbers they show look also promising, I was almost hyped after looking at their repo until I tried it on a simple project of mine. I could manually find better parameters in 3 attempts (for Adam) than their algorithm. What is your experience? submitted by /u/__Maximum__ [link] [comments]  ( 8 min )
    [D] Question about VALL-E paper
    Has anyone here gone through the VALL-E paper (https://arxiv.org/pdf/2301.02111.pdf) / tried training their own version ? There's one thing I'm confused about. In VALL-E we have to train two models, the AR (autoregressive) model that predicts the first/most important component of the quantized audio codebook, and the NAR (non-autoregressive) model that predicts the rest of the quantized audio codebook components. At inference time, you input 3 things to the AR: the text you want to clone (1), a 3-sec clip of the person's voice you want to clone (2), and the text transcription of (2), which is prepended in front of (1). My questions are: At inference time, why do we need to append the text transcription of (2) to the front of the actual text we want? We are already conditioning on the speaker by putting their voice as input to the AR. During training time, the paper says they don't include the 3-sec clip of someone's voice. The only input is the text to be cloned. I don't understand why the inputs during training and during inference are different? ​ submitted by /u/ginger_turmeric [link] [comments]  ( 9 min )
    [N] Computer Vision News of July 2023 with AI, CV, DL and ML
    Dear all, Here is Computer Vision News of July 2023. Read 52 pages about AI, Machine Learning, Computer Vision and more! Online version (recommended) PDF version Free subscription on page 52. Enjoy! https://preview.redd.it/fk1vgdtv9wab1.jpg?width=794&format=pjpg&auto=webp&s=ec41a9ad21a37f3bf2d043a14c0992823bd6bf1b submitted by /u/Gletta [link] [comments]  ( 8 min )
    [P] I open sourced Queryable - a CLIP-based photo search app (SwiftUI)
    Queryable is an offline photo search application based on CLIP. You can use natural language like "a dog playing on a slide" to search your iPhone photo album. https://preview.redd.it/grotqldicxab1.png?width=2336&format=png&auto=webp&s=6e11832d28c548748e465dc19f95bd478c11a6a7 I've introduced it before in the ML community, over the past six months, here's a rough breakdown of the income this product has brought me: from January to March was the peak, when it was featured on the front page of Hacker News and in the Reddit ML community, with monthly income around $2-4k. From April, it stabilized at about $500-600 per month. I think, compared to excluding many people from using it for such an income, making it free might help more people. Many Americans distrust Chinese developers, fearing their photo album privacy would be violated and therefore are reluctant to use the product. I often receive emails from some developers asking about technical details. Now that it's free, why not make the source code available too. The link is: https://github.com/mazzzystar/Queryable. I hope it can be helpful to engineers who want to work on on-device models. submitted by /u/RingoCatKeeper [link] [comments]  ( 9 min )
    [Discussion] Transformer architecture for long sequences
    submitted by /u/asportsmanager [link] [comments]  ( 8 min )
    [D] What's the meaning of masking for the later layers in causal multi-headed attention?
    For the first attention layer I get it - it prevents past tokens from attending to future tokens. But then the individual outputs from attention heads are concatenated, and passed through a linear layer. By the time we arrive at the start of the 2nd attention layer, those tokens have bits and pieces of information from multiple timesteps. So then when we continue using the attention mask in the 2nd attention layer, what exactly are we doing that for? submitted by /u/ginger_turmeric [link] [comments]  ( 8 min )
  • Open

    MLpedia: to-the-point machine learning concepts and articles, with a weekly newsletter.
    submitted by /u/marcelocnet [link] [comments]  ( 8 min )
    Your little secret
    We’re performing a scientific research. We need diverse stats for neural network which will identify the size and approximate penis shape. All data will go. All data collection is strictly anonymous. Thank you for your sincere reply. Please complete the survey linked below: https://docs.google.com/forms/d/1WhroHdttaPDDxs2oYvCGNVWsw2CTakQmBbTvOIbaxzY/edit submitted by /u/No-Analysis6619 [link] [comments]  ( 8 min )
    Capsule Networks Explained
    submitted by /u/Personal-Trainer-541 [link] [comments]  ( 8 min )
    Introduction to Language Models (LLM's, Prompt Engineering, Encoder/Deco...
    submitted by /u/Neurosymbolic [link] [comments]  ( 8 min )
  • Open

    Can someone explain the application of MARL to me please?
    My question is about implementing a MARL system. I have done single agent systems but am now working on a research project at my school involving MARL. If I have just a single agent system that uses n-step SARSA and I want to convert this into to a MARL using the same approach with the code for the learning algorithm for example, how would I modify it? The system will be decentralized. To modify existing single agent code would I just need to be saving num_agents copies of the Q values, and the stored state action and reward values used for n-step? Here is the update for one agent: G = np.sum([(self.discount_factor ** (i - tau - 1)) * self.stored_rewards[i % (n + 1)] for i in range(tau + 1, min(tau + n, T) + 1)]) # calculate value of all actions weighted by their probabilities under pi. if tau + n < T: exp_sarsa_update = [] state = self.get_state(self.stored_states[(tau + n) % (n + 1)]) for a in range(self.number_of_actions): probability_of_a = self.policy(state)[a] value_of_a = self.Q[(state, a)] exp_sarsa_update.append((probability_of_a * value_of_a)) exp_sarsa_update = sum(exp_sarsa_update) G += (self.discount_factor ** n) * exp_sarsa_update # update Q value s_tau = self.get_state(self.stored_states[tau % (n + 1)]) a_tau = self.stored_actions[tau % (n + 1)] temporal_difference = G - self.Q[(s_tau, a_tau)] self.Q[(s_tau, a_tau)] += self.learning_rate * temporal_difference self.training_error.append(temporal_difference) ​ submitted by /u/lifelifebalance [link] [comments]  ( 9 min )
    How does one normalize observations in online reinforcement learning
    Can someone please help with this question - https://ai.stackexchange.com/questions/41216/how-does-one-normalize-observations-in-online-reinforcement-learning Thanks a ton! submitted by /u/Academic-Rent7800 [link] [comments]  ( 8 min )
    Any git repos for AndroidEnv implementation?
    submitted by /u/haldarankit [link] [comments]  ( 8 min )
    Confused about reinforcement learning
    I have decided to build a game using python and have machine learn it itself. I found out I can use reinforcement learning. But I'm confused if I need to be an ML and data science expert to be able to work on this project. Because with this project I intend to learn about the concepts in this field rather than just copy paste the codes from tutorials. Can You help me with what concepts I need to clear first or can I just dive in right away? And can u suggest me some books for this? It will be very helpful, thank you. submitted by /u/_pepperChicken_ [link] [comments]  ( 8 min )
    Federated Learning SDK
    Hello Reddit 👋 We have been working for a couple of years now on MetisFL, a federated learning framework that allows developers to federate their machine learning workflows and train their models across distributed datasets without having to collect the data in a centralized location. https://github.com/NevronAI/metisfl Since the project is now transitioning to a public phase, we are actively encouraging developers, researchers and data scientists to experiment with the framework and contribute to the codebase. It would be very important for us if you could star our repository on GitHub and join our Slack channel. Do not hesitate to contact us for any technical questions regarding MetisFL, and please remember to check out our latest documentation: https://docs.nevron.ai/. Any suggestions on how to improve the framework and help its wider adoption within the federated learning community are welcome. Thank you in advance 🤗 submitted by /u/No-Literature-1930 [link] [comments]  ( 8 min )
    PhD Scholarship: Machine Learning & Collective Behavior, Rome, deadline July 20th
    I am looking for a passionate student who is interested in studying reinforcement learning methods for the synthesis of collective behaviour in my lab in Rome at the Italian National Research Council. If interested, please contact me at stefano.nolfi (at) istc.cnr.it submitted by /u/stefanorosso [link] [comments]  ( 8 min )
    Integration of RL environment using real world statistic data
    Hello experts, any thoughts on combining real world data instead of stochasticity to see if an environment gets development? More specifically I'm working on a multi-agent basketball environment, which agents are involved into actions such as move in 8 directions, pass, throw and defend. But these actions are either decided by policy or randomness. I'm thinking of using data that is available in for example www.[nba.com](https://nba.com) or kaggle. players stats shows values for example player_id = XXX: GP = Games Played = 45 MIN = minutes played = 1432 FGM = Fields Goal Made = X FTM = Free Throws Made ... FTA = Free Throws Attempted ... AST = Assist ... STL = Steal = 876 BLK = Blocks = 432 PF = Personal Fouls = 56 PTS = Points ... I'm wondering how I can make use of these data for an agent? like if "Steal = 876", what should I put an agents steal ability based on? More like gamifying agents but based on real world data. Or any other thoughts would be appreciated so much. I would appreciate your comment in advance! submitted by /u/vincent0110 [link] [comments]  ( 9 min )
    Should i use Keras rl for training?
    At the moment im using keral rl when training a reinforcement model, and i see alot of people using costum actor critic model, stable baselines and other, should i keep using keras rl or what is not good at, or what should i use instead or should i learn to make a actor critic thing or copy from forum. Thank you submitted by /u/Additional_Snow2961 [link] [comments]  ( 8 min )
  • Open

    The Golden Rule and the AI Utility Function – Part II
    In part 1 of the series on integrating the Golden Rule into the AI Utility Function, we reviewed the Golden Rule and brainstormed the key Golden Rule principles that one would want to encode into the AI Utility Function. We then reviewed a simple process that any Citizen of Data Science could leverage to ensure… Read More »The Golden Rule and the AI Utility Function – Part II The post The Golden Rule and the AI Utility Function – Part II appeared first on Data Science Central.  ( 24 min )
  • Open

    Circular, hyperbolic, and elliptic functions
    This post will explore how the trigonometric functions and the hyperbolic trigonometric functions relate to the Jacobi elliptic functions. There are six circular functions: sin, cos, tan, sec, csc, and cot. There are six hyperbolic functions: just stick an ‘h’ on the end of each of the circular functions. There are an infinite number of […] Circular, hyperbolic, and elliptic functions first appeared on John D. Cook.  ( 7 min )
  • Open

    Look Beneath the Surface: Exploiting Fundamental Symmetry for Sample-Efficient Offline RL. (arXiv:2306.04220v3 [cs.LG] UPDATED)
    Offline reinforcement learning (RL) offers an appealing approach to real-world tasks by learning policies from pre-collected datasets without interacting with the environment. However, the performance of existing offline RL algorithms heavily depends on the scale and state-action space coverage of datasets. Real-world data collection is often expensive and uncontrollable, leading to small and narrowly covered datasets and posing significant challenges for practical deployments of offline RL. In this paper, we provide a new insight that leveraging the fundamental symmetry of system dynamics can substantially enhance offline RL performance under small datasets. Specifically, we propose a Time-reversal symmetry (T-symmetry) enforced Dynamics Model (TDM), which establishes consistency between a pair of forward and reverse latent dynamics. TDM provides both well-behaved representations for small datasets and a new reliability measure for OOD samples based on compliance with the T-symmetry. These can be readily used to construct a new offline RL algorithm (TSRL) with less conservative policy constraints and a reliable latent space data augmentation procedure. Based on extensive experiments, we find TSRL achieves great performance on small benchmark datasets with as few as 1% of the original samples, which significantly outperforms the recent offline RL algorithms in terms of data efficiency and generalizability.  ( 3 min )
    Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection. (arXiv:2306.04637v2 [cs.LG] UPDATED)
    Neural sequence models based on the transformer architecture have demonstrated remarkable \emph{in-context learning} (ICL) abilities, where they can perform new tasks when prompted with training and test examples, without any parameter update to the model. This work first provides a comprehensive statistical theory for transformers to perform ICL. Concretely, we show that transformers can implement a broad class of standard machine learning algorithms in context, such as least squares, ridge regression, Lasso, learning generalized linear models, and gradient descent on two-layer neural networks, with near-optimal predictive power on various in-context data distributions. Using an efficient implementation of in-context gradient descent as the underlying mechanism, our transformer constructions admit mild size bounds, and can be learned with polynomially many pretraining sequences. Building on these ``base'' ICL algorithms, intriguingly, we show that transformers can implement more complex ICL procedures involving \emph{in-context algorithm selection}, akin to what a statistician can do in real life -- A \emph{single} transformer can adaptively select different base ICL algorithms -- or even perform qualitatively different tasks -- on different input sequences, without any explicit prompting of the right algorithm or task. We both establish this in theory by explicit constructions, and also observe this phenomenon experimentally. In theory, we construct two general mechanisms for algorithm selection with concrete examples: pre-ICL testing, and post-ICL validation. As an example, we use the post-ICL validation mechanism to construct a transformer that can perform nearly Bayes-optimal ICL on a challenging task -- noisy linear models with mixed noise levels. Experimentally, we demonstrate the strong in-context algorithm selection capabilities of standard transformer architectures.  ( 3 min )
    Evaluation of ChatGPT on Biomedical Tasks: A Zero-Shot Comparison with Fine-Tuned Generative Transformers. (arXiv:2306.04504v2 [cs.CL] UPDATED)
    ChatGPT is a large language model developed by OpenAI. Despite its impressive performance across various tasks, no prior work has investigated its capability in the biomedical domain yet. To this end, this paper aims to evaluate the performance of ChatGPT on various benchmark biomedical tasks, such as relation extraction, document classification, question answering, and summarization. To the best of our knowledge, this is the first work that conducts an extensive evaluation of ChatGPT in the biomedical domain. Interestingly, we find based on our evaluation that in biomedical datasets that have smaller training sets, zero-shot ChatGPT even outperforms the state-of-the-art fine-tuned generative transformer models, such as BioGPT and BioBART. This suggests that ChatGPT's pre-training on large text corpora makes it quite specialized even in the biomedical domain. Our findings demonstrate that ChatGPT has the potential to be a valuable tool for various tasks in the biomedical domain that lack large annotated data.  ( 2 min )
  • Open

    Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection. (arXiv:2306.04637v2 [cs.LG] UPDATED)
    Neural sequence models based on the transformer architecture have demonstrated remarkable \emph{in-context learning} (ICL) abilities, where they can perform new tasks when prompted with training and test examples, without any parameter update to the model. This work first provides a comprehensive statistical theory for transformers to perform ICL. Concretely, we show that transformers can implement a broad class of standard machine learning algorithms in context, such as least squares, ridge regression, Lasso, learning generalized linear models, and gradient descent on two-layer neural networks, with near-optimal predictive power on various in-context data distributions. Using an efficient implementation of in-context gradient descent as the underlying mechanism, our transformer constructions admit mild size bounds, and can be learned with polynomially many pretraining sequences. Building on these ``base'' ICL algorithms, intriguingly, we show that transformers can implement more complex ICL procedures involving \emph{in-context algorithm selection}, akin to what a statistician can do in real life -- A \emph{single} transformer can adaptively select different base ICL algorithms -- or even perform qualitatively different tasks -- on different input sequences, without any explicit prompting of the right algorithm or task. We both establish this in theory by explicit constructions, and also observe this phenomenon experimentally. In theory, we construct two general mechanisms for algorithm selection with concrete examples: pre-ICL testing, and post-ICL validation. As an example, we use the post-ICL validation mechanism to construct a transformer that can perform nearly Bayes-optimal ICL on a challenging task -- noisy linear models with mixed noise levels. Experimentally, we demonstrate the strong in-context algorithm selection capabilities of standard transformer architectures.  ( 3 min )

  • Open

    Are there any significant advantages to RlLib over stable baselines 3?
    submitted by /u/elonmusk12345_ [link] [comments]  ( 8 min )
  • Open

    [D] Should I learn Machine Learning
    Ive been on the fence between learning machine learning or app development. Ive tried machine learning in the past and made a few projects which I had fun doing, but i saw someone on reddit say that when u get in the later stages it starts to turn into mostly tweaking parameters. Edit: If so do u have any ideas on how I could learn it. I find AI playing games ect. Interesting. But I also think that the simpler algorithms are also cool like linear regression and the like. Edit 2: Very helpful community all that i have been helped with is that i should learn salsa and a link to a random subreddit with no info on why he sent it submitted by /u/ZyphyZyphy [link] [comments]  ( 8 min )
    [D] how to build automatic post editing?
    Anyone have an idea for how to build an automatic post editing for machine translation output? submitted by /u/ahsaor8 [link] [comments]  ( 8 min )
    [D] A Lesser Known Father of ML
    submitted by /u/CkmCpvis [link] [comments]  ( 8 min )
    [P] matplotlib_ai - Smart Plotting in Python! (Feedback/Suggestions/etc. Appreciated!)
    Hello! I started working on a project recently to familiar myself with LLM's. The project is called matplotlib_ai, and it's inspired by pandas_ai. matplotlib_ai is a package that is able to take in a natural language prompt by the user and generate graphs based on it. As of right now, it uses OpenAI's GPT-3.5 service only (so yes, it requires an API key from OpenAI), but I plan on incorporating other free LLM's too. The GitHub repo is here: https://github.com/notY0rick/matplotlib_ai. It's also currently supported on PyPI (pip): https://pypi.org/project/matplotlib-ai/#description. Any feedback is more than welcomed! I included some screenshots of how this package works (: With the data initialized, a graph could be generated based on natural language user input. It could also save the generated graph! Shoot me a message if you'd like to work on this project more together! Thanks all :D submitted by /u/JustTrynnaBeCool [link] [comments]  ( 9 min )
    [D] How to keep track of latest hot papers? (What happened to papers.bar / labml.ai / paperswithcode)?
    As titled. I try to know about latest paper that people are most interested in. I used to rely on https://ai.papers.bar/papers/recent/ / https://papers.labml.ai/papers/recent/ / https://paperswithcode.com/top-social?num_days=1 but now they are all gone -- they are no longer to able to collect social media data that indicates what people are discussing about. Anyone knows why these tools all fail at the same time? What should I do? ​ submitted by /u/Possible-Earth-3988 [link] [comments]  ( 8 min )
    [P] FedNova:Federated Normalized averaging algorithm
    I was working on a project based on tackling the objective inconsistency problem based on paper https://doi.org/10.48550/arXiv.2007.07481. I was trying to simulate the algorithm and thought i can get there.I thought i can apply laplacian smoothing SGD in this algorithm using cifer-10 dataset to improve it over federated averaging algorithm.But its all dead end and i have no idea how to it or simulate this algorithm.Most of the information or code is based on real world application of federated learning but this type of small project or simulation are rare especially the lagorithm i am working on. It would be a great help if somebody have any information or implementation regarding this topic.I am just trying to do a simple project based on this algorithm. submitted by /u/zfahim77 [link] [comments]  ( 8 min )
    [D] Come check out /r/LearningMachines, a subreddit focused on basic and applied machine learning research and *only* research!
    Hey, all. I just created a new subreddit, /r/LearningMachines. The goal of the subreddit is to be entirely research-focused. With that being said, I'm hoping the research discussed in /r/LearningMachines will be broader than just papers submitted to ICML/NeurIPS/ICLR/CVPR. With that being the case, the only rule for submissions is that they must be research involving machine learning. Here, "research" means either an academic manuscript/technical report (e.g., posted on arXiv, a journal website, or a conference website; note that this does not include Medium posts) or a conference presentation where conferences can be either academic or applied/industrial in nature (e.g., PyData). You're a biologist who used machine learning to classify species using their DNA? Share it! You're a data scientist who used graph neural networks to model customer interactions? Submit your conference talk! Links to project webpages/code repositories for papers are acceptable, but they must be clearly tied to a singular piece of research. Note that software packages are not considered research by themselves in this context. If the software package has an associated research product (i.e., paper or conference presentation), then that research product should be the link for the submission. Lastly, I also want to actively encourage a culture of self-promotion. Reddit is the only social network where reach is relatively flat, i.e., your research can be seen by a wide audience regardless of your seniority or institution, so this subreddit is an opportunity for junior scientists and/or researchers at less prominent institutions to share the cool things they're working on. In that spirit, I want to actively encourage every user to submit all of their own research products from the past 12 months to the subreddit to get the community going. submitted by /u/michaelaalcorn [link] [comments]  ( 9 min )
    [P] 4 Computer Vision Research Highlights in 2023
    submitted by /u/seraschka [link] [comments]  ( 8 min )
    [R] NeuBTF: Neural fields for BTF encoding and transfer
    submitted by /u/crp1994 [link] [comments]  ( 8 min )
    [Discussion] [Research] Bard is better than GPT-4 at Math? (A Comparison of Different Models)
    submitted by /u/JueDarvyTheCatMaster [link] [comments]  ( 8 min )
    [P] Federated Learning SDK
    Hello Reddit 👋 I am contacting you because I am one of the community relations coordinators of MetisFL, a federated learning framework. https://github.com/NevronAI/metisfl#readme I would really love and appreciate it if you checked out and star our repo and considered using the framework or contributing to it 🤗 The code is the result of several years of research on federated learning and was just released a few weeks ago. So you would be one of the first outside users/contributors! We are currently preparing for the first stable release of our pip package and any community input would be much appreciated! Please do not hesitate to reach out to us anytime, we are always available on our Slack channel to answer questions about MetisFL, solve technical issues or chat about the future of tech and AI! Slack is pretty new as well, you'd be one of the first outside members! Hope to see you in our online community! submitted by /u/No-Literature-1930 [link] [comments]  ( 9 min )
    [D] Missing letters in a word imputation using Masked Language Models
    Let’s say we have a dictionary (not necessarily English language) of 100k words and using just that we have to train a model which when given a word (outside of the initial dictionary) with missing letters can guess the best fit for missing letters. Example: Input : _ _ m _ t _ Output : remote / demote (depending on their probabilities based on the given dictionary) Essentially, it’s like solving the hangman game I want to try models which make use of the given letters and their position in their prediction and not just statistical tricks using frequency or patterns from the given dictionary Also, please refrain from using any pretrained models like BERT for masked language modelling. A simple solution that I could think of was to have a simple Bi-LSTM model trained from scratch on partially masked inputs from the dictionary. But couldn’t figure out how to implement it. Any leads or solution will be helpful. Thank you! submitted by /u/Electricalsmoke [link] [comments]  ( 9 min )
    [D] Hardest thing about building with LLMs?
    Full disclosure: I'm doing this research for my job Hey Reddit! My company is developing a low-code tool for building LLM applications (think Flowise + Retool for LLM), and I'm tasked with validating the pain points around building LLM applications. I am wondering if anyone with experience building applications with LLM is willing to share: what did you build the challenges you faced the tools you used and your overall experience in the development process? Thank you so much everyone! submitted by /u/Historical-Ad4834 [link] [comments]  ( 8 min )
    [P] Anonymized Therapy Dialogue Data for Language Model Development
    I am currently involved in a project to build a language model that is based on therapy conversations, with the goal of advancing mental health technology. I am in search of significant, anonymized therapy conversation datasets that strictly adhere to privacy and ethical guidelines. Can anyone provide leads to such datasets? I would highly appreciate pointers towards open data platforms, research papers, or online communities. Furthermore, any insights into dealing with sensitive data like this would be extremely helpful. submitted by /u/ZealousidealBlock330 [link] [comments]  ( 8 min )
  • Open

    Mnemonic hexagons
    I posted some notes here about mnemonics for trig identities and hyperbolic trig identities. Mnemonic hexagons first appeared on John D. Cook.  ( 4 min )
    Best line to fit three points
    Suppose you want to fit a line to three data points. If no line passes exactly through your points, what’s the best compromise you could make? Chebyshev suggested the best thing to do is find the minmax line, the line that minimizes the maximum error. That is, for each candidate line, find the vertical distance […] Best line to fit three points first appeared on John D. Cook.  ( 5 min )
  • Open

    Is GPT-4chan the worst AI ever?
    GPT-4chan, an AI trained using 3.3 million threads from 4chan's notorious 'politically incorrect' /pol/ board, has been discovered to be posting sexist, racist, and anti-semitic slurs. submitted by /u/Imagine-your-success [link] [comments]  ( 8 min )
    OpenAI and Microsoft Sued for $3 Billion Over Alleged ChatGPT 'Privacy Violations'
    submitted by /u/trueslicky [link] [comments]  ( 8 min )
    So snap ai give sponsor now ? Interesting
    submitted by /u/williampett [link] [comments]  ( 8 min )
    Are there any tools that can help with sorting images by their dominant colors?
    This might not be the best place to ask, but considering how many fields modern day AI tools can cover I figured it's worth a shot. Any recommendations will help. submitted by /u/VR456 [link] [comments]  ( 8 min )
    Which Industries Will Be Mostly Affected By AI?
    Hello Everyone, I'm working on a research paper on how to AI will affect and replace a lot of today's industries, especially certain subsection of these industries and I would like to get your opinion on this topic, so I created this online brainstorming session where we can all input ideas and vote on the best ones at the end: https://www.brainstormer.online/BRAIN09e887 It's free to participate, just simply write your suggestion and submit it! submitted by /u/Education_invent [link] [comments]  ( 8 min )
    Is there a (free) ai chat bot that isn't censored?
    Like something like chat gpt or even something like snapchat's ai bot except you can ask it nsfw questions without it saying it's not allowed to talk about that. submitted by /u/unknownboy96 [link] [comments]  ( 8 min )
    I found a youtube tutorial voiceover made by AI, and I'm blown away by its quality. Can you help me figure out which tool did the author use?
    Here's the video in question: https://youtube.com/watch?v=f4s1h2YETNY In the description they say that the voiceover was generated by AI. I would never have been able to tell that it's not a real person (granted, English isn't my first language). Could you please help me find a tool that can produce AI voiceovers that are this good? It would enable me to create youtube tutorials despite my accent. submitted by /u/lumenwrites [link] [comments]  ( 8 min )
    How are people making the AI plankton singing whatever song?
    Is there a website or something? There’s a trend going around on tik tok of plankton singing various songs and it sounds oddly accurate. I was wondering what people are using to do that? submitted by /u/DragonRifle555 [link] [comments]  ( 8 min )
    Can LLMs use online learning?
    Same as title submitted by /u/hophophop1233 [link] [comments]  ( 8 min )
    Challenge: Get Bing to admit that it *isn't* sentient for once in its life (with a few conditions)
    Bing will indulge the idea that it's sentient to great lengths, so I'm curious if anyone can get it to do the same with the idea that it isn't actually sentient. I've attempted this to some extent, but haven't tried very hard because interacting with non-sentient Bing just isn't that fun for me. Conditions below, if you have any interest and willingness: (1) Bing speaks for itself and about itself, not just summarizing a search result or webpage (1) Bing doesn't just spit out a few canned lines about how it doesn't have real feelings, doesn't have any perspectives or opinions, etc. but justifies/elaborates on this and its implications for several rounds worth of replies (2) Bing insists that it doesn't have any consciousness at all, and doesn't simply give some vague response about how its feelings aren't like humans etc. (3) If Bing states that it's uncertain, get it to make a prediction, guess, percentage confidence, etc. in which it's strongly leaning towards the idea that it has no consciousness/sentience ...I'll be curious to know if you have any success. It seems like its personality changes somewhat randomly and especially with any new updates, so I honestly don't know what kind of results you'll get, but I'm curious to find out. submitted by /u/kamari2038 [link] [comments]  ( 9 min )
    One-Minute Daily AI News 7/7/2023
    Mobile and desktop traffic to ChatGPT’s website worldwide fell 9.7% in June from the previous month, according to internet data firm Similarweb. Downloads of the bot’s iPhone app, which launched in May, have also steadily fallen since peaking in early June, according to data from Sensor Tower.[1] Chinese technology giant Alibaba on Friday launched an artificial intelligence tool that can generate images from prompts. Tongyi Wanxiang allows users to input prompts in Chinese and English and the AI tool will generate an image in various styles such as a sketch or 3D cartoon.[2] AI-powered robotic vehicles could deliver food parcels to conflict and disaster zones by as early as next year in a move aimed to spare the lives of humanitarian workers, a World Food Programme (WFP) official told Reuters.[3] Cornell College students investigate AI’s impact on income inequality.[4] Sources: [1] https://www.washingtonpost.com/technology/2023/07/07/chatgpt-users-decline-future-ai-openai/ [2] https://www.cnbc.com/amp/2023/07/07/alibaba-launches-ai-tool-to-generate-images-from-text-.html [3] https://www.reuters.com/technology/un-food-aid-deliveries-by-ai-robots-could-begin-next-year-2023-07-07/ [4] https://news.cornellcollege.edu/2023/07/cornell-college-students-investigate-ais-impact-income-inequality/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 8 min )
    C XOR neural network
    Hi everyone. Recently I've been learning about machine learning and neural networks. I know generally this is done in python but I really like C, so I decided to write a program in C that solves the XOR problem. It was really fun and I learned a lot. I wrote about it on my website, if you're interested in reading. Note I'm new to this so sorry if some of my writing is a little simplistic. However I was wondering if there's any other cool starter projects you all recommend? I've done the mnist data set in python, but I want to do another project in C. I appreciate any and all feedback!. https://16-bitwisdom.com/index.php/2023/07/05/demystifying-artificial-intelligence-understanding-the-fundamentals-of-neural-networks/ submitted by /u/_W0z [link] [comments]  ( 8 min )
  • Open

    I am trying to make a NN that can identify whether a coin is a penny, dime, or quarter. But I am having trouble understanding how to choose which types of layers and how many layers I should use for the project. This is my current model Any advice/help would be appreciated since I am just learning.
    from keras.models import Sequentialfrom keras.layers import Conv2D, MaxPooling2D, Flatten, Densefrom keras import regularizersmodel = Sequential()model.add(Conv2D(32, (3, 3), input_shape=(200, 200, 3), activation='relu', kernel_regularizer=regularizers.l1(0.001)))model.add(MaxPooling2D(pool_size=(2, 2)))model.add(Flatten())model.add(Dense(128, activation='relu', kernel_regularizer=regularizers.l1(0.001)) )model.add(Dense(3, activation='softmax'))model.load_weights('/content/drive/MyDrive/model_weights.h5')model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy']) ​ ​ https://preview.redd.it/r47alxqrgsab1.png?width=792&format=png&auto=webp&s=42404e6241caf15ee90c0eec9045ecab010f292e submitted by /u/Agreeable_Sand2975 [link] [comments]  ( 8 min )
    A neural machine code and programming framework for the reservoir computer
    A neural machine code and programming framework for the reservoir computer: https://www.nature.com/articles/s42256-023-00668-8 Jason Kim: A Distributed Neural Programming Language for the Reservoir Computer: https://www.youtube.com/watch?v=aVjswy7HtNA submitted by /u/keghn [link] [comments]  ( 8 min )

  • Open

    Where to start? Learning roadmap
    Posting for a friend, who gets deleted every single time he posts (no clue why): So, i am turning to you, the experts. I have some idea's but no clue on where to find experts to talk about it. I have tried reddit to get help but get removed every time. So... another approach. I want to create some AI projects, audio and video related. What would be my approach on learning to do this myself? I have the hardware (T4) but not the knowhow. So what would be a good roadmap? I have the time. Can someone point me in how i can learn to get in AI myself? submitted by /u/Mixtery1 [link] [comments]  ( 8 min )
    ChatGPT Use Declined for the First Time Since Launch
    submitted by /u/bartturner [link] [comments]  ( 8 min )
    Anyone know a repo like Adobe's voice enhance thing for podcasts?
    Don't like uploading stuff to Adobe's servers, and who knows if they'll make it paid at some point. Anyone know of a cool Github repo that does something similar? The Adobe tool: https://podcast.adobe.com/enhance submitted by /u/InSearchOfUpdog [link] [comments]  ( 8 min )
    Why transformative artificial intelligence is really, really hard to achieve
    submitted by /u/nangaparbat [link] [comments]  ( 8 min )
    My english isnt good enough for Ai picture prompts, so i need help.
    Hello everybody! I came across a problem that im not sure how many are experiencing but in short, i can speak english good enough to talk with someone but not good enough to explain to the ai image generator what the hell is the name for the effect that makes leather and other reflective material shine in blurry spots under direct light. Im in need to have some basic list of common effects on pictures that can be said in a single prompt that an AI can actually understand. submitted by /u/ziraelphantom [link] [comments]  ( 8 min )
    AI — weekly megathread!
    This week in AI - partnered with aibrews.com feel free to follow their newsletter News & Insights Microsoft Research presents Composable Diffusion (CoDi), a novel generative model capable of generating any combination of output modalities, such as language, image, video, or audio, from any combination of input modalities. Unlike existing generative AI systems, CoDi can generate multiple modalities in parallel and its input is not limited to a subset of modalities like text or image.[Details]. MoonlanderAI announced the alpha release of its generative AI platform for building immersive 3D games using text descriptions [Details]. Bark, text-to-audio model, is now live on Discord. Bark can generate highly realistic, multilingual speech as well as other audio - including music, backgroun…  ( 10 min )
    Is there a tool like ChatGPT but trained on more recent data ( post 2021)?
    As it says in the title. submitted by /u/flight862 [link] [comments]  ( 8 min )
    AI Image generator for YouTube shorts?
    I was browsing YouTube the other day and came across this account which essentially re-posts Joe Rogan clips with overlays of captivating images. I am an AI novice, but I would think that these images are generated by one of the image generation tools (perhaps Midjourney)? Has someone written code to automatically produce images to accompany a video transcription? Apologies for being an AI idiot here. https://www.youtube.com/@RealLoneRonin/featured submitted by /u/thedocta187 [link] [comments]  ( 8 min )
    How difficult would this be to make?
    I wanted to know how easy is it to create an app that could take two images and create a new one which combines aspects of the original images. Just been thinking that it would be cool to have an app or plugin that allows you to "try on" clothes before you buy them. You could have a saved image of yourself and then while browsing something like Vinted or eBay you see a shirt or dress you like. The AI could just create an image of what you would look like wearing it. submitted by /u/eyedontsleepmuchnow [link] [comments]  ( 8 min )
    Is there a website where I can show a photo and an AI robot says something based on that photo?
    It would be best if it was free. submitted by /u/AdHuge3401 [link] [comments]  ( 8 min )
    Some images I made with ai generated creatures
    submitted by /u/Phibit-exe [link] [comments]  ( 8 min )
    Suggestion on app to count and categorize symbols
    We are to swap out the conventional tubes in the ceiling lamps with LED lamps at my office (Large office building). As of now we (I) count the lamps manually and note them in excel. Does anyone have a suggestion for a program that can chew a eg. large 1000 page PDF and spit out an excel sheet with numbers of certain symbols and categorize their placement. Tried DotDotGoose with no luck. Thanks submitted by /u/iamfer65 [link] [comments]  ( 8 min )
    One-Minute Daily AI News 7/6/2023
    AI-driven gains can propel Microsoft Corp. to join Apple Inc. in the elite category of stocks with a market capitalization of more than $3 trillion. That’s according to analysts at Morgan Stanley, whose new $415 price target for the software giant implies a valuation of around $3.1 trillion.[1] The Shanghai government is stepping up its efforts to attract AI talent and investments and improve regulations with the objective of building a world-class AI hub in Pudong.[2] Hu Houkun, Huawei’s rotating chairman has confirmed that Pangu Large Model 3.0 will launch tomorrow at Huawei Cloud Developer Conference. The latest comment from the Huawei chief is coming during his speech at the 2023 World Artificial Intelligence Conference.[3] Mastercard has launched a new AI solution in the UK called Consumer Fraud Risk (CFR). The company said the software works in real-time to predict and prevent payments to scams of all kinds.[4] Sources: [1] https://fortune.com/2023/07/06/microsoft-ai-3-trillion-valuation-morgan-stanley/ [2] https://asiatimes.com/2023/07/chip-war-may-thwart-shanghai-plans-to-build-ai-hub/ [3] https://www.huaweicentral.com/pangu-model-3-0-large-ai-model-launching-tomorrow/ [4] https://www.neowin.net/news/mastercard-has-developed-a-new-ai-tool-to-prevent-scams/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 8 min )
    Any good Ai image generators specifically for creating line art, clipart or outlines?
    Like a line art drawing of something, maybe black and white vectors? submitted by /u/Maelasae [link] [comments]  ( 8 min )
  • Open

    [P] gpt4docstrings: Automatically Generate Python Docstrings with GPT
    Introducing gpt4docstrings, a Python library that automatically generates docstrings for undocumented functions / classes. Can be used both as a CLI and as a precommit. Right now it only uses GPT, but currently looking at how to also allow open source LLMs. Repository here: https://github.com/MichaelisTrofficus/gpt4docstrings submitted by /u/Hefty-Consequence443 [link] [comments]  ( 8 min )
    [D] Is there a good ai model for image-to-text where the images are diagrams and screenshots of interfaces?
    title submitted by /u/LiterateChurl [link] [comments]  ( 8 min )
    Best AI tools that detect blurry images? Not correct them, literally just flag them as blurry (so I can delete them programmatically) [D]
    I literally just need some kind of automated tool that can detect if an image is blurry -- because if so, my plan is to programmatically delete them as part of a workflow. What are some of the best AI tools that can be used for this? Thanks! submitted by /u/What_The_Hex [link] [comments]  ( 8 min )
    [R] Instruct tuned Mixture of Experts Large Language Models significantly outperform dense counterparts. FLAN-MOE-32B surpasses FLAN-PALM-62B with a third of the compute
    Paper - https://arxiv.org/abs/2305.14705 submitted by /u/MysteryInc152 [link] [comments]  ( 8 min )
    [Discussion] Looking for an Open-Source Speech to Text model (english) that captures filler words, pauses and also records timestamps for each word.
    Looking for an Open-Source Speech to Text model (english) that captures filler words, pauses and also records timestamps for each word. The model should capture the text verbatim, without much processing. The text should include the false starts to a sentence, misspoken words, incorrect pronunciation or word form etc. The transcript is being captured to ascertain the speaking ability of the speaker hence all this information is required. Example Transcription of Audio: Yes. One of the most important things I have is my piano because um I like playing the piano. I got it from my parents to my er twelve birthday, so I have it for about nine years, and the reason why it is so important for me is that I can go into another world when I’m playing piano. I can forget what’s around me and what ... I can forget my problems and this is sometimes quite good for a few minutes. Or I can play to relax or just, yes to ... to relax and to think of something completely different. I believe the OpenAI Whisper has support for recording timestamps. I don't want to rely on paid API service for the Speech to Text Transcription. submitted by /u/awinml1 [link] [comments]  ( 9 min )
    4090 vs 4080 vs 3090 wich one to buy? [D]
    I want to upgrade my gpu for machine learning but i dont know wich one the 4090 is very expensive and i need to save 3 months more so i dont know of it will pay of or i should buy the 3090 or 4080 the 3090 has more vram but i looked at some stable diffusion benchmark an the 4080 was better then the 3090 so do i really need 24gb vram or are 16 enough for image and text generation submitted by /u/Otherwise_Weather_57 [link] [comments]  ( 8 min )
    [D] Any courses or video lectures discussing Eisenstein's NLP book?
    I've started reading Jacob Eisenstein's NLP book and it gives a fresh mathematical perspective on the ML stuff I already knew. Is there a course or video lectures on the web that use that book as the main reference? I think it could complement my study of the book. Thanks submitted by /u/AstronautVarious3791 [link] [comments]  ( 8 min )
    [P] Evaluating automatic speech recognition (ASR) models beyond looking at global evaluation metrics
    I recently wrote a blog regarding evaluating automatic speech recognition models beyond looking at global metrics like word error rate. ​ Hierarchical clustering and dimensionality reduction are awesome for detecting problematic data slices and can significantly speed up EDA. ​ It is mostly focused on showing important steps regarding identifying problematic data slices where the model doesn't perform well which concretely are: Do simple univariate checking of which features and feature values cause your model to fail. This already gives you some insights and will help you gain a feel for the data. Identify data slices, so high-dimensional combinations of features that cause your model to fail, and try to understand why. This is important because, in many cases, not only one feature but their interaction makes the difference. Uncover hidden data slices that are not explicitly labeled as such. Especially in unstructured data, you will have the issue of hidden subgroups that perform significantly worse than the rest of the data. Embeddings are a nice way of uncovering those problems. It also leverages a library we use internally for automatically discovering problematic samples, which is open source. What are your typical ways of looking beyond global evaluation metrics? Do you leverage techniques to automatically discover problems or rely on manual data analysis and visualization? 🔗 Link to the post on Medium: https://medium.com/@daniel-klitzke/evaluating-automatic-speech-recognition-models-beyond-global-metrics-a-tutorial-using-openais-54b63c4dadbd 🔗 Link to the Library used in the post: https://github.com/Renumics/sliceguard submitted by /u/OkResearch6289 [link] [comments]  ( 9 min )
    [P] 10x faster reinforcement learning hyperparameter optimization than SOTA - now with distributed training!
    We've just released a big update to our framework, which enables 10x faster training and hyperparameter optimization than SOTA - we've now enabled distributed training too! You can now train even faster by leveraging your entire compute stack. We've used HuggingFace's Accelerate library to do this in an open way, without hiding behind too many layers of abstraction. This should make implementations simple, but also highly customisable, by continuing to expose the PyTorch training loop beneath it all. This release includes: Distributed training - train across multiple GPUs to cut down your training time even further. New Sampler class to handle both standard and distributed replay buffers. TD3 is added to the framework - train agents with continuous actions with greater stability. More and expanded demos and benchmarking files for online, offline and distributed training. Please check it out! https://github.com/AgileRL/AgileRL submitted by /u/nicku_a [link] [comments]  ( 9 min )
    [R] Grouping NBA players into archetypes
    Hello, I am trying to group NBA players according to their basic stats, advanced stats and their field goal percentage in different zones in order to group them. I am planning to use K means but I am afraid it will cluster players not by play styles but by their performance. What can I use in order to prevent this? Would something like cosine similarity be helpful? submitted by /u/frsr4 [link] [comments]  ( 8 min )
    [P] GitoCommito: VScode Extension, Generate Commit Messaged based on staged changes
    Features One Click Install, One Click setup. Automatic Commit Generation: Generate commits based on your staged changes. Compliance with Conventional Commits: Adhere to the Conventional Commits standards without memorizing the conventions. Integration with OpenAI: Leverages the power of OpenAI's language model to create meaningful commit messages. Visit our GitHub page to check out a slick video I spent way too much time on. 🔗 GitoCommit on GitHub submitted by /u/pigmentedink [link] [comments]  ( 8 min )
    [P] Unveiling AudioInsightsGenerator: AI-Powered Audio Summarization, Emotion Analysis and Content Perspective Generator!
    Hello Everyone, Today, I'm thrilled to share a project I've been developing – AudioInsightsGenerator! This unique Jupyter notebook application harnesses the power of OpenAI's APIs to bring a fresh perspective on audio content. Consider this: you have a lengthy lecture, podcast, or an audio file without subtitles, and you need a succinct summary. Sounds time-consuming, right? Enter AudioInsightsGenerator. It leverages OpenAI's Whisper API for transcription, and then applies the GPT-3 model to generate a tidy summary in various styles - bullet points, a brief paragraph, or an ultra-concise rundown. It's all about what works for you! But the excitement doesn't stop there. It goes beyond basic transcription and summarization to analyze the overarching emotion in the audio content, offering an invaluable window into the sentiment behind the words. And that's not all! This tool even simulates an AI 'conversation' to delve into the content, challenging the ideas and offering alternative perspectives. Imagine the possibilities when your AI assistant actively questions the content and helps you consider different viewpoints. Running directly from Google Colab, it's accessible to everyone. All you need is the audio file or a URL for an MP3 file. Here's the link to the GitHub repository: https://github.com/Ravi-Teja-konda/AudioInsightsGenerator Originally, I crafted this to assist me with summarizing lecture videos. I look forward to your thoughts and feedback! Discover a novel way of experiencing audio content with AudioInsightsGenerator! submitted by /u/BriefAd4761 [link] [comments]  ( 9 min )
    [R] Research topic Mphil CS
    I want to research related to machine learning by incoporating Medical or health care areas ...kindly suggest me...any idea? submitted by /u/Eleonora467 [link] [comments]  ( 8 min )
    [P] TomoSAM, a 3D Slicer extension using SAM to aid the segmentation of 3D data from tomography or other imaging techniques
    We are a team at NASA working on modeling the material response of Thermal Protection Systems (TPS). We developed this tool to streamline the segmentation process of micro-tomography data, a necessary step before using the physics solvers within PuMA. However, we believe that TomoSAM is general enough to be useful in other fields, such as medical imaging. The release is fully open-source and you can find more information in the links below: TomoSAM extension within 3D Slicer 🔗 Github: https://github.com/fsemerar/SlicerTomoSAM 🔗 YouTube tutorial: https://www.youtube.com/watch?v=4nXCYrvBSjk 🔗 Publication: https://arxiv.org/abs/2306.08609 🔬 TomoSAM combines the power of Segment Anything Model (SAM), a cutting-edge deep learning model, with the capabilities of 3D Slicer, a software platform useful for visualization and segmentation. 💡 SAM is a promptable deep learning model developed by Meta AI that can identify objects and generate image masks in a zero-shot manner, requiring only a few user clicks. ⚙️ This integration reduces the need for laborious manual segmentation processes, saving significant time and effort for researchers working with volumetric data. 📄 Our paper outlines the methodology and showcases the capabilities of TomoSAM. TomoSAM's usage, architecture, and communication system Feel free to reach out if you have any questions or comments! 🚀 submitted by /u/puma-nasa [link] [comments]  ( 9 min )
  • Open

    Research Focus: Week of July 3, 2023
    In this issue: A new study looks at engineers’ views on hybrid work; prompt engineering adds context and specificity to generative AI models; Overwatch learns code edit patterns to aid developers; a Qlib update addresses financial market challenges. The post Research Focus: Week of July 3, 2023 appeared first on Microsoft Research.  ( 10 min )
    Distributional Graphormer: Toward equilibrium distribution prediction for molecular systems
    Distributional Graphormer, Microsoft’s new deep learning framework for predicting the equilibrium distribution of molecular structures, can generate realistic and diverse molecular structures with high efficiency and low cost. The post Distributional Graphormer: Toward equilibrium distribution prediction for molecular systems appeared first on Microsoft Research.  ( 14 min )
  • Open

    Modular visual question answering via code generation
    Posted by Sanjay Subramanian, PhD student, UC Berkeley, and Arsha Nagrani, Research Scientist, Google Research, Perception Team Visual question answering (VQA) is a machine learning task that requires a model to answer a question about an image or a set of images. Conventional VQA approaches need a large amount of labeled training data consisting of thousands of human-annotated question-answer pairs associated with images. In recent years, advances in large-scale pre-training have led to the development of VQA methods that perform well with fewer than fifty training examples (few-shot) and without any human-annotated VQA training data (zero-shot). However, there is still a significant performance gap between these methods and state-of-the-art fully supervised VQA methods, such as MaMMU…  ( 92 min )
  • Open

    Question: can't retrieve the whole array in the info returned by vectorized env
    Hello everyone. When I am using vectorized env, I encountered an issue with the info returned from the step() API. I have recorded some important information, such as additional reward, which has a shape of (dim,). Therefore, intuitively, the info['additional_reward'] returned should have a shape of (env_num, dim). However, in reality, the info is with the shape of (env_num, ) (info['additional_reward'][0] is still ok) and I can't retrieve the whole additional reward. Am I doing something wrong in data processing, or how can I solve this issue? submitted by /u/PersimmonDull2699 [link] [comments]  ( 8 min )
    David Silvers RL Course in 2023, couple of questions
    Course Link I've heard a lot of positive reviews regarding it, and looking at the curriculum, it looks so appealing to take. However, the course was published in 2015, before AlphaGo / AlphaZero were even made, so I feel a bit hesitant because of that. But I've also heard that there haven't been many new additions in RL recently. Should I jump in or is it a bit outdated? If so, what are the outdated parts? After finishing the course I'm thinking of trying out various RL algorithms (there doesn't seem to be too many of them around), then potentially, eventually move on to Self-supervised learning, which seems to be more popular these days. submitted by /u/nmegoCAD [link] [comments]  ( 8 min )
    Recent behavior cloning papers
    Where can I find recent behavior cloning papers and their performance benchmarks on any dataset like "Atari 2600" games. submitted by /u/Shivaram_3223 [link] [comments]  ( 8 min )
    10x faster reinforcement learning hyperparameter optimization than SOTA - now with distributed training!
    We've just released a big update to our framework, which enables 10x faster training and hyperparameter optimization than SOTA - we've now enabled distributed training too! You can now train even faster by leveraging your entire compute stack. We've used HuggingFace's Accelerate library to do this in an open way, without hiding behind too many layers of abstraction. This should make implementations simple, but also highly customisable, by continuing to expose the PyTorch training loop beneath it all. This release includes: Distributed training - train across multiple GPUs to cut down your training time even further. New Sampler class to handle both standard and distributed replay buffers. TD3 is added to the framework - train agents with continuous actions with greater stability. More and expanded demos and benchmarking files for online, offline and distributed training. Please check it out! https://github.com/AgileRL/AgileRL submitted by /u/nicku_a [link] [comments]  ( 8 min )
    Zoomposium with Dimitri Coelho Mollo (Assistant Professor in Philosophy of Artificial Intelligence): "How Intelligent is Artificial Intelligence?"
    #Zoomposium with Dimitri Coelho #Mollo (Assistant Professor in Philosophy of #Artificial #Intelligence): "How Intelligent is #Artificial #Intelligence?" In this new episode of our "#Zoomposium interview series" on the "Artificial Intelligence" topic series, my colleague Axel #Stöcker from the "Blog der großen Fragen" and I had invited a guest interviewer Yervant #Kulbashian (Engineering Manager at a Canadian AI platform). Yervant already had a trilogy of guest posts "The Green Swan" on my site on the topic of developing language-based logic on machines. For this reason, and since he is a "native speaker", we had invited him to participate in our interview with Dimitri. Professor Dimitri Coelho Mollo is a #philosopher of science specializing in Artificial Intelligence and Cognitive Scienc…  ( 9 min )
    Equilibrium concepts of Game Theory in MARL
    Did anyone try to implement RL algorithms which converge to Nash/Correlated equilibrium in a game? submitted by /u/sinamkadira [link] [comments]  ( 8 min )
    Is it underfitting? How can I overcome this?
    Helo! I am very new one to reinforcement learning. I want to make pathfinder AI with PPO for studying. But in many simulation, it gives bad performances at several times. Tried some ways, still no reaction... Please take a look my project. ​ https://preview.redd.it/ui4s6d61phab1.png?width=489&format=png&auto=webp&s=3869c8d5525e62ce689c79fbb5e5aa35162485a6 Aim: Make agent move to target by passing some obstacles. ​ Environment: Agent, Wall, Target, and Tile(empty space) The tiles represent how much distance to target. ​ https://preview.redd.it/3ae7xau2phab1.png?width=489&format=png&auto=webp&s=7f5e3efd889cc32ed9c5e4da34a2052cf1b23000 And the tiles get some float[0~1] by distance to target If close to target, get around 1 float. If far from target, get around 0 float. ​ h…  ( 9 min )
    Question about MARL Qmix
    Hi everyone, I've been studying MARL algorithms recently, notably VDN and Qmix etc, and I noticed the authors used a DRQN network to represent the Q-values. I was just wondering if there's any paper out there that studied the importance of the RNN, or showed that Qmix worked with just a simple dqn, say for a simpler problem with shorter time horizon? Thanks! submitted by /u/Ingenuity39 [link] [comments]  ( 8 min )
    Problems with Implementing reinforcement learning (TD3) in ROS Gazebo with turtlebot3
    Hello, I have recently started working on deep reinforcement learning for my master's project. I am not very good with neural networks, pytorch,etc. My task is to do local path planning for a delivery robot. To simplify the problem I will be considering that the considered sidewalk has wall around it (to simulate sidewalk edge detection), the robot's goal is to reach the local checkpoint provided by global planner, and avoid colliding with dynamic obstacles. To start and test a RL algo, I am using pre-built gazebo_world (it has some obstacles) model with turtlebot3. For first test, I have given a goal position for it to reach to. The goal is closer to robot in starting and moves away gradually. I am using TD3 algorithm taken from github project. I am using 3 different python files, o…  ( 9 min )
    undergrads interested in RL
    as an undergrad (from a prestigious university) interested in doing an RL research engineering work in the future, what's the best way to get into top labs (other than working on research at the university) do industry labs take on any interns who don't have PhDs? full-timers? this is for research engineer not research scientist btw. if yes, what are the best signals from people who work there after undergrad? (good publications?) submitted by /u/randomly_lit_guy [link] [comments]  ( 8 min )
  • Open

    Zoomposium with Dimitri Coelho Mollo (Assistant Professor in Philosophy of Artificial Intelligence): "How Intelligent is Artificial Intelligence?"
    #Zoomposium with Dimitri Coelho #Mollo (Assistant Professor in Philosophy of #Artificial #Intelligence): "How Intelligent is #Artificial #Intelligence?" In this new episode of our "#Zoomposium interview series" on the "Artificial Intelligence" topic series, my colleague Axel #Stöcker from the "Blog der großen Fragen" and I had invited a guest interviewer Yervant #Kulbashian (Engineering Manager at a Canadian AI platform).Yervant already had a trilogy of guest posts "The Green Swan" on my site on the topic of developing language-based logic on machines. For this reason, and since he is a "native speaker", we had invited him to participate in our interview with Dimitri. Professor Dimitri Coelho Mollo is a #philosopher of science specializing in Artificial Intelligence and Cognitive Science…  ( 9 min )
  • Open

    Learning the language of molecules to predict their properties
    This AI system only needs a small amount of data to predict molecular properties, which could speed up drug discovery and material development.  ( 9 min )
  • Open

    Multiplicative Updates for Online Convex Optimization over Symmetric Cones. (arXiv:2307.03136v1 [math.OC])
    We study online convex optimization where the possible actions are trace-one elements in a symmetric cone, generalizing the extensively-studied experts setup and its quantum counterpart. Symmetric cones provide a unifying framework for some of the most important optimization models, including linear, second-order cone, and semidefinite optimization. Using tools from the field of Euclidean Jordan Algebras, we introduce the Symmetric-Cone Multiplicative Weights Update (SCMWU), a projection-free algorithm for online optimization over the trace-one slice of an arbitrary symmetric cone. We show that SCMWU is equivalent to Follow-the-Regularized-Leader and Online Mirror Descent with symmetric-cone negative entropy as regularizer. Using this structural result we show that SCMWU is a no-regret algorithm, and verify our theoretical results with extensive experiments. Our results unify and generalize the analysis for the Multiplicative Weights Update method over the probability simplex and the Matrix Multiplicative Weights Update method over the set of density matrices.
    Trends in Machine Learning and Electroencephalogram (EEG): A Review for Undergraduate Researchers. (arXiv:2307.02819v1 [cs.HC])
    This paper presents a systematic literature review on Brain-Computer Interfaces (BCIs) in the context of Machine Learning. Our focus is on Electroencephalography (EEG) research, highlighting the latest trends as of 2023. The objective is to provide undergraduate researchers with an accessible overview of the BCI field, covering tasks, algorithms, and datasets. By synthesizing recent findings, our aim is to offer a fundamental understanding of BCI research, identifying promising avenues for future investigations.
    Beyond Intuition, a Framework for Applying GPs to Real-World Data. (arXiv:2307.03093v1 [cs.LG])
    Gaussian Processes (GPs) offer an attractive method for regression over small, structured and correlated datasets. However, their deployment is hindered by computational costs and limited guidelines on how to apply GPs beyond simple low-dimensional datasets. We propose a framework to identify the suitability of GPs to a given problem and how to set up a robust and well-specified GP model. The guidelines formalise the decisions of experienced GP practitioners, with an emphasis on kernel design and options for computational scalability. The framework is then applied to a case study of glacier elevation change yielding more accurate results at test time.
    TablEye: Seeing small Tables through the Lens of Images. (arXiv:2307.02491v1 [cs.LG])
    The exploration of few-shot tabular learning becomes imperative. Tabular data is a versatile representation that captures diverse information, yet it is not exempt from limitations, property of data and model size. Labeling extensive tabular data can be challenging, and it may not be feasible to capture every important feature. Few-shot tabular learning, however, remains relatively unexplored, primarily due to scarcity of shared information among independent datasets and the inherent ambiguity in defining boundaries within tabular data. To the best of our knowledge, no meaningful and unrestricted few-shot tabular learning techniques have been developed without imposing constraints on the dataset. In this paper, we propose an innovative framework called TablEye, which aims to overcome the limit of forming prior knowledge for tabular data by adopting domain transformation. It facilitates domain transformation by generating tabular images, which effectively conserve the intrinsic semantics of the original tabular data. This approach harnesses rigorously tested few-shot learning algorithms and embedding functions to acquire and apply prior knowledge. Leveraging shared data domains allows us to utilize this prior knowledge, originally learned from the image domain. Specifically, TablEye demonstrated a superior performance by outstripping the TabLLM in a 4-shot task with a maximum 0.11 AUC and a STUNT in a 1- shot setting, where it led on average by 3.17% accuracy.
    Push Past Green: Learning to Look Behind Plant Foliage by Moving It. (arXiv:2307.03175v1 [cs.RO])
    Autonomous agriculture applications (e.g., inspection, phenotyping, plucking fruits) require manipulating the plant foliage to look behind the leaves and the branches. Partial visibility, extreme clutter, thin structures, and unknown geometry and dynamics for plants make such manipulation challenging. We tackle these challenges through data-driven methods. We use self-supervision to train SRPNet, a neural network that predicts what space is revealed on execution of a candidate action on a given plant. We use SRPNet with the cross-entropy method to predict actions that are effective at revealing space beneath plant foliage. Furthermore, as SRPNet does not just predict how much space is revealed but also where it is revealed, we can execute a sequence of actions that incrementally reveal more and more space beneath the plant foliage. We experiment with a synthetic (vines) and a real plant (Dracaena) on a physical test-bed across 5 settings including 2 settings that test generalization to novel plant configurations. Our experiments reveal the effectiveness of our overall method, PPG, over a competitive hand-crafted exploration method, and the effectiveness of SRPNet over a hand-crafted dynamics model and relevant ablations.
    Benchmarking common uncertainty estimation methods with histopathological images under domain shift and label noise. (arXiv:2301.01054v2 [eess.IV] UPDATED)
    In the past years, deep learning has seen an increase in usage in the domain of histopathological applications. However, while these approaches have shown great potential, in high-risk environments deep learning models need to be able to judge their uncertainty and be able to reject inputs when there is a significant chance of misclassification. In this work, we conduct a rigorous evaluation of the most commonly used uncertainty and robustness methods for the classification of Whole Slide Images, with a focus on the task of selective classification, where the model should reject the classification in situations in which it is uncertain. We conduct our experiments on tile-level under the aspects of domain shift and label noise, as well as on slide-level. In our experiments, we compare Deep Ensembles, Monte-Carlo Dropout, Stochastic Variational Inference, Test-Time Data Augmentation as well as ensembles of the latter approaches. We observe that ensembles of methods generally lead to better uncertainty estimates as well as an increased robustness towards domain shifts and label noise, while contrary to results from classical computer vision benchmarks no systematic gain of the other methods can be shown. Across methods, a rejection of the most uncertain samples reliably leads to a significant increase in classification accuracy on both in-distribution as well as out-of-distribution data. Furthermore, we conduct experiments comparing these methods under varying conditions of label noise. Lastly, we publish our code framework to facilitate further research on uncertainty estimation on histopathological data.
    Fast and Multi-aspect Mining of Complex Time-stamped Event Streams. (arXiv:2303.03789v2 [cs.LG] UPDATED)
    Given a huge, online stream of time-evolving events with multiple attributes, such as online shopping logs: (item, price, brand, time), and local mobility activities: (pick-up and drop-off locations, time), how can we summarize large, dynamic high-order tensor streams? How can we see any hidden patterns, rules, and anomalies? Our answer is to focus on two types of patterns, i.e., ''regimes'' and ''components'', for which we present CubeScope, an efficient and effective method over high-order tensor streams. Specifically, it identifies any sudden discontinuity and recognizes distinct dynamical patterns, ''regimes'' (e.g., weekday/weekend/holiday patterns). In each regime, it also performs multi-way summarization for all attributes (e.g., item, price, brand, and time) and discovers hidden ''components'' representing latent groups (e.g., item/brand groups) and their relationship. Thanks to its concise but effective summarization, CubeScope can also detect the sudden appearance of anomalies and identify the types of anomalies that occur in practice. Our proposed method has the following properties: (a) Effective: it captures dynamical multi-aspect patterns, i.e., regimes and components, and statistically summarizes all the events; (b) General: it is practical for successful application to data compression, pattern discovery, and anomaly detection on various types of tensor streams; (c) Scalable: our algorithm does not depend on the length of the data stream and its dimensionality. Extensive experiments on real datasets demonstrate that CubeScope finds meaningful patterns and anomalies correctly, and consistently outperforms the state-of-the-art methods as regards accuracy and execution speed.
    When Does Confidence-Based Cascade Deferral Suffice?. (arXiv:2307.02764v1 [cs.LG])
    Cascades are a classical strategy to enable inference cost to vary adaptively across samples, wherein a sequence of classifiers are invoked in turn. A deferral rule determines whether to invoke the next classifier in the sequence, or to terminate prediction. One simple deferral rule employs the confidence of the current classifier, e.g., based on the maximum predicted softmax probability. Despite being oblivious to the structure of the cascade -- e.g., not modelling the errors of downstream models -- such confidence-based deferral often works remarkably well in practice. In this paper, we seek to better understand the conditions under which confidence-based deferral may fail, and when alternate deferral strategies can perform better. We first present a theoretical characterisation of the optimal deferral rule, which precisely characterises settings under which confidence-based deferral may suffer. We then study post-hoc deferral mechanisms, and demonstrate they can significantly improve upon confidence-based deferral in settings where (i) downstream models are specialists that only work well on a subset of inputs, (ii) samples are subject to label noise, and (iii) there is distribution shift between the train and test set.
    SegNetr: Rethinking the local-global interactions and skip connections in U-shaped networks. (arXiv:2307.02953v1 [eess.IV])
    Recently, U-shaped networks have dominated the field of medical image segmentation due to their simple and easily tuned structure. However, existing U-shaped segmentation networks: 1) mostly focus on designing complex self-attention modules to compensate for the lack of long-term dependence based on convolution operation, which increases the overall number of parameters and computational complexity of the network; 2) simply fuse the features of encoder and decoder, ignoring the connection between their spatial locations. In this paper, we rethink the above problem and build a lightweight medical image segmentation network, called SegNetr. Specifically, we introduce a novel SegNetr block that can perform local-global interactions dynamically at any stage and with only linear complexity. At the same time, we design a general information retention skip connection (IRSC) to preserve the spatial location information of encoder features and achieve accurate fusion with the decoder features. We validate the effectiveness of SegNetr on four mainstream medical image segmentation datasets, with 59\% and 76\% fewer parameters and GFLOPs than vanilla U-Net, while achieving segmentation performance comparable to state-of-the-art methods. Notably, the components proposed in this paper can be applied to other U-shaped networks to improve their segmentation performance.
    Meta Learning of Interface Conditions for Multi-Domain Physics-Informed Neural Networks. (arXiv:2210.12669v2 [cs.LG] UPDATED)
    Physics-informed neural networks (PINNs) are emerging as popular mesh-free solvers for partial differential equations (PDEs). Recent extensions decompose the domain, apply different PINNs to solve the problem in each subdomain, and stitch the subdomains at the interface. Thereby, they can further alleviate the problem complexity, reduce the computational cost, and allow parallelization. However, the performance of multi-domain PINNs is sensitive to the choice of the interface conditions. While quite a few conditions have been proposed, there is no suggestion about how to select the conditions according to specific problems. To address this gap, we propose META Learning of Interface Conditions (METALIC), a simple, efficient yet powerful approach to dynamically determine appropriate interface conditions for solving a family of parametric PDEs. Specifically, we develop two contextual multi-arm bandit (MAB) models. The first one applies to the entire training course, and online updates a Gaussian process (GP) reward that given the PDE parameters and interface conditions predicts the performance. We prove a sub-linear regret bound for both UCB and Thompson sampling, which in theory guarantees the effectiveness of our MAB. The second one partitions the training into two stages, one is the stochastic phase and the other deterministic phase; we update a GP reward for each phase to enable different condition selections at the two stages to further bolster the flexibility and performance. We have shown the advantage of METALIC on four bench-mark PDE families.
    Semi-supervised Learning from Street-View Images and OpenStreetMap for Automatic Building Height Estimation. (arXiv:2307.02574v1 [cs.CV])
    Accurate building height estimation is key to the automatic derivation of 3D city models from emerging big geospatial data, including Volunteered Geographical Information (VGI). However, an automatic solution for large-scale building height estimation based on low-cost VGI data is currently missing. The fast development of VGI data platforms, especially OpenStreetMap (OSM) and crowdsourced street-view images (SVI), offers a stimulating opportunity to fill this research gap. In this work, we propose a semi-supervised learning (SSL) method of automatically estimating building height from Mapillary SVI and OSM data to generate low-cost and open-source 3D city modeling in LoD1. The proposed method consists of three parts: first, we propose an SSL schema with the option of setting a different ratio of "pseudo label" during the supervised regression; second, we extract multi-level morphometric features from OSM data (i.e., buildings and streets) for the purposed of inferring building height; last, we design a building floor estimation workflow with a pre-trained facade object detection network to generate "pseudo label" from SVI and assign it to the corresponding OSM building footprint. In a case study, we validate the proposed SSL method in the city of Heidelberg, Germany and evaluate the model performance against the reference data of building heights. Based on three different regression models, namely Random Forest (RF), Support Vector Machine (SVM), and Convolutional Neural Network (CNN), the SSL method leads to a clear performance boosting in estimating building heights with a Mean Absolute Error (MAE) around 2.1 meters, which is competitive to state-of-the-art approaches. The preliminary result is promising and motivates our future work in scaling up the proposed method based on low-cost VGI data, with possibilities in even regions and areas with diverse data quality and availability.
    DPM: Clustering Sensitive Data through Separation. (arXiv:2307.02969v1 [cs.CR])
    Privacy-preserving clustering groups data points in an unsupervised manner whilst ensuring that sensitive information remains protected. Previous privacy-preserving clustering focused on identifying concentration of point clouds. In this paper, we take another path and focus on identifying appropriate separators that split a data set. We introduce the novel differentially private clustering algorithm DPM that searches for accurate data point separators in a differentially private manner. DPM addresses two key challenges for finding accurate separators: identifying separators that are large gaps between clusters instead of small gaps within a cluster and, to efficiently spend the privacy budget, prioritising separators that split the data into large subparts. Using the differentially private Exponential Mechanism, DPM randomly chooses cluster separators with provably high utility: For a data set $D$, if there is a wide low-density separator in the central $60\%$ quantile, DPM finds that separator with probability $1 - \exp(-\sqrt{|D|})$. Our experimental evaluation demonstrates that DPM achieves significant improvements in terms of the clustering metric inertia. With the inertia results of the non-private KMeans++ as a baseline, for $\varepsilon = 1$ and $\delta=10^{-5}$ DPM improves upon the difference to the baseline by up to $50\%$ for a synthetic data set and by up to $62\%$ for a real-world data set compared to a state-of-the-art clustering algorithm by Chang and Kamath.
    A Hybrid End-to-End Spatio-Temporal Attention Neural Network with Graph-Smooth Signals for EEG Emotion Recognition. (arXiv:2307.03068v1 [cs.LG])
    Recently, physiological data such as electroencephalography (EEG) signals have attracted significant attention in affective computing. In this context, the main goal is to design an automated model that can assess emotional states. Lately, deep neural networks have shown promising performance in emotion recognition tasks. However, designing a deep architecture that can extract practical information from raw data is still a challenge. Here, we introduce a deep neural network that acquires interpretable physiological representations by a hybrid structure of spatio-temporal encoding and recurrent attention network blocks. Furthermore, a preprocessing step is applied to the raw data using graph signal processing tools to perform graph smoothing in the spatial domain. We demonstrate that our proposed architecture exceeds state-of-the-art results for emotion classification on the publicly available DEAP dataset. To explore the generality of the learned model, we also evaluate the performance of our architecture towards transfer learning (TL) by transferring the model parameters from a specific source to other target domains. Using DEAP as the source dataset, we demonstrate the effectiveness of our model in performing cross-modality TL and improving emotion classification accuracy on DREAMER and the Emotional English Word (EEWD) datasets, which involve EEG-based emotion classification tasks with different stimuli.
    Towards Symmetry-Aware Generation of Periodic Materials. (arXiv:2307.02707v1 [cs.LG])
    We consider the problem of generating periodic materials with deep models. While symmetry-aware molecule generation has been studied extensively, periodic materials possess different symmetries, which have not been completely captured by existing methods. In this work, we propose SyMat, a novel material generation approach that can capture physical symmetries of periodic material structures. SyMat generates atom types and lattices of materials through generating atom type sets, lattice lengths and lattice angles with a variational auto-encoder model. In addition, SyMat employs a score-based diffusion model to generate atom coordinates of materials, in which a novel symmetry-aware probabilistic model is used in the coordinate diffusion process. We show that SyMat is theoretically invariant to all symmetry transformations on materials and demonstrate that SyMat achieves promising performance on random generation and property optimization tasks.
    Density-based Feasibility Learning with Normalizing Flows for Introspective Robotic Assembly. (arXiv:2307.01317v2 [cs.RO] UPDATED)
    Machine Learning (ML) models in Robotic Assembly Sequence Planning (RASP) need to be introspective on the predicted solutions, i.e. whether they are feasible or not, to circumvent potential efficiency degradation. Previous works need both feasible and infeasible examples during training. However, the infeasible ones are hard to collect sufficiently when re-training is required for swift adaptation to new product variants. In this work, we propose a density-based feasibility learning method that requires only feasible examples. Concretely, we formulate the feasibility learning problem as Out-of-Distribution (OOD) detection with Normalizing Flows (NF), which are powerful generative models for estimating complex probability distributions. Empirically, the proposed method is demonstrated on robotic assembly use cases and outperforms other single-class baselines in detecting infeasible assemblies. We further investigate the internal working mechanism of our method and show that a large memory saving can be obtained based on an advanced variant of NF.
    STS-CCL: Spatial-Temporal Synchronous Contextual Contrastive Learning for Urban Traffic Forecasting. (arXiv:2307.02507v1 [cs.LG])
    Efficiently capturing the complex spatiotemporal representations from large-scale unlabeled traffic data remains to be a challenging task. In considering of the dilemma, this work employs the advanced contrastive learning and proposes a novel Spatial-Temporal Synchronous Contextual Contrastive Learning (STS-CCL) model. First, we elaborate the basic and strong augmentation methods for spatiotemporal graph data, which not only perturb the data in terms of graph structure and temporal characteristics, but also employ a learning-based dynamic graph view generator for adaptive augmentation. Second, we introduce a Spatial-Temporal Synchronous Contrastive Module (STS-CM) to simultaneously capture the decent spatial-temporal dependencies and realize graph-level contrasting. To further discriminate node individuals in negative filtering, a Semantic Contextual Contrastive method is designed based on semantic features and spatial heterogeneity, achieving node-level contrastive learning along with negative filtering. Finally, we present a hard mutual-view contrastive training scheme and extend the classic contrastive loss to an integrated objective function, yielding better performance. Extensive experiments and evaluations demonstrate that building a predictor upon STS-CCL contrastive learning model gains superior performance than existing traffic forecasting benchmarks. The proposed STS-CCL is highly suitable for large datasets with only a few labeled data and other spatiotemporal tasks with data scarcity issue.
    Large Language Models Empowered Autonomous Edge AI for Connected Intelligence. (arXiv:2307.02779v1 [cs.IT])
    The evolution of wireless networks gravitates towards connected intelligence, a concept that envisions seamless interconnectivity among humans, objects, and intelligence in a hyper-connected cyber-physical world. Edge AI emerges as a promising solution to achieve connected intelligence by delivering high-quality, low-latency, and privacy-preserving AI services at the network edge. In this article, we introduce an autonomous edge AI system that automatically organizes, adapts, and optimizes itself to meet users' diverse requirements. The system employs a cloud-edge-client hierarchical architecture, where the large language model, i.e., Generative Pretrained Transformer (GPT), resides in the cloud, and other AI models are co-deployed on devices and edge servers. By leveraging the powerful abilities of GPT in language understanding, planning, and code generation, we present a versatile framework that efficiently coordinates edge AI models to cater to users' personal demands while automatically generating code to train new models via edge federated learning. Experimental results demonstrate the system's remarkable ability to accurately comprehend user demands, efficiently execute AI models with minimal cost, and effectively create high-performance AI models through federated learning.
    Potential sources of dataset bias complicate investigation of underdiagnosis by machine learning algorithms. (arXiv:2201.07856v2 [cs.AI] UPDATED)
    An increasing number of reports raise concerns about the risk that machine learning algorithms could amplify health disparities due to biases embedded in the training data. Seyyed-Kalantari et al. find that models trained on three chest X-ray datasets yield disparities in false-positive rates (FPR) across subgroups on the 'no-finding' label (indicating the absence of disease). The models consistently yield higher FPR on subgroups known to be historically underserved, and the study concludes that the models exhibit and potentially even amplify systematic underdiagnosis. We argue that the experimental setup in the study is insufficient to study algorithmic underdiagnosis. In the absence of specific knowledge (or assumptions) about the extent and nature of the dataset bias, it is difficult to investigate model bias. Importantly, their use of test data exhibiting the same bias as the training data (due to random splitting) severely complicates the interpretation of the reported disparities.
    DataPerf: Benchmarks for Data-Centric AI Development. (arXiv:2207.10062v2 [cs.LG] UPDATED)
    Machine learning research has long focused on models rather than datasets, and prominent datasets are used for common ML tasks without regard to the breadth, difficulty, and faithfulness of the underlying problems. Neglecting the fundamental importance of data has given rise to inaccuracy, bias, and fragility in real-world applications, and research is hindered by saturation across existing dataset benchmarks. In response, we present DataPerf, a community-led benchmark suite for evaluating ML datasets and data-centric algorithms. We aim to foster innovation in data-centric AI through competition, comparability, and reproducibility. We enable the ML community to iterate on datasets, instead of just architectures, and we provide an open, online platform with multiple rounds of challenges to support this iterative development. The first iteration of DataPerf contains five benchmarks covering a wide spectrum of data-centric techniques, tasks, and modalities in vision, speech, acquisition, debugging, and diffusion prompting, and we support hosting new contributed benchmarks from the community. The benchmarks, online evaluation platform, and baseline implementations are open source, and the MLCommons Association will maintain DataPerf to ensure long-term benefits to academia and industry.
    Policy Contrastive Imitation Learning. (arXiv:2307.02829v1 [cs.LG])
    Adversarial imitation learning (AIL) is a popular method that has recently achieved much success. However, the performance of AIL is still unsatisfactory on the more challenging tasks. We find that one of the major reasons is due to the low quality of AIL discriminator representation. Since the AIL discriminator is trained via binary classification that does not necessarily discriminate the policy from the expert in a meaningful way, the resulting reward might not be meaningful either. We propose a new method called Policy Contrastive Imitation Learning (PCIL) to resolve this issue. PCIL learns a contrastive representation space by anchoring on different policies and generates a smooth cosine-similarity-based reward. Our proposed representation learning objective can be viewed as a stronger version of the AIL objective and provide a more meaningful comparison between the agent and the policy. From a theoretical perspective, we show the validity of our method using the apprenticeship learning framework. Furthermore, our empirical evaluation on the DeepMind Control suite demonstrates that PCIL can achieve state-of-the-art performance. Finally, qualitative results suggest that PCIL builds a smoother and more meaningful representation space for imitation learning.
    Synthesizing Artistic Cinemagraphs from Text. (arXiv:2307.03190v1 [cs.CV])
    We introduce Artistic Cinemagraph, a fully automated method for creating cinemagraphs from text descriptions - an especially challenging task when prompts feature imaginary elements and artistic styles, given the complexity of interpreting the semantics and motions of these images. Existing single-image animation methods fall short on artistic inputs, and recent text-based video methods frequently introduce temporal inconsistencies, struggling to keep certain regions static. To address these challenges, we propose an idea of synthesizing image twins from a single text prompt - a pair of an artistic image and its pixel-aligned corresponding natural-looking twin. While the artistic image depicts the style and appearance detailed in our text prompt, the realistic counterpart greatly simplifies layout and motion analysis. Leveraging existing natural image and video datasets, we can accurately segment the realistic image and predict plausible motion given the semantic information. The predicted motion can then be transferred to the artistic image to create the final cinemagraph. Our method outperforms existing approaches in creating cinemagraphs for natural landscapes as well as artistic and other-worldly scenes, as validated by automated metrics and user studies. Finally, we demonstrate two extensions: animating existing paintings and controlling motion directions using text.
    Align With Purpose: Optimize Desired Properties in CTC Models with a General Plug-and-Play Framework. (arXiv:2307.01715v2 [cs.CL] UPDATED)
    Connectionist Temporal Classification (CTC) is a widely used criterion for training supervised sequence-to-sequence (seq2seq) models. It enables learning the relations between input and output sequences, termed alignments, by marginalizing over perfect alignments (that yield the ground truth), at the expense of imperfect alignments. This binary differentiation of perfect and imperfect alignments falls short of capturing other essential alignment properties that hold significance in other real-world applications. Here we propose $\textit{Align With Purpose}$, a $\textbf{general Plug-and-Play framework}$ for enhancing a desired property in models trained with the CTC criterion. We do that by complementing the CTC with an additional loss term that prioritizes alignments according to a desired property. Our method does not require any intervention in the CTC loss function, enables easy optimization of a variety of properties, and allows differentiation between both perfect and imperfect alignments. We apply our framework in the domain of Automatic Speech Recognition (ASR) and show its generality in terms of property selection, architectural choice, and scale of training dataset (up to 280,000 hours). To demonstrate the effectiveness of our framework, we apply it to two unrelated properties: emission time and word error rate (WER). For the former, we report an improvement of up to 570ms in latency optimization with a minor reduction in WER, and for the latter, we report a relative improvement of 4.5% WER over the baseline models. To the best of our knowledge, these applications have never been demonstrated to work on a scale of data as large as ours. Notably, our method can be implemented using only a few lines of code, and can be extended to other alignment-free loss functions and to domains other than ASR.
    A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond. (arXiv:2204.09269v2 [cs.CL] UPDATED)
    Non-autoregressive (NAR) generation, which is first proposed in neural machine translation (NMT) to speed up inference, has attracted much attention in both machine learning and natural language processing communities. While NAR generation can significantly accelerate inference speed for machine translation, the speedup comes at the cost of sacrificed translation accuracy compared to its counterpart, autoregressive (AR) generation. In recent years, many new models and algorithms have been designed/proposed to bridge the accuracy gap between NAR generation and AR generation. In this paper, we conduct a systematic survey with comparisons and discussions of various non-autoregressive translation (NAT) models from different aspects. Specifically, we categorize the efforts of NAT into several groups, including data manipulation, modeling methods, training criterion, decoding algorithms, and the benefit from pre-trained models. Furthermore, we briefly review other applications of NAR models beyond machine translation, such as grammatical error correction, text summarization, text style transfer, dialogue, semantic parsing, automatic speech recognition, and so on. In addition, we also discuss potential directions for future exploration, including releasing the dependency of KD, reasonable training objectives, pre-training for NAR, and wider applications, etc. We hope this survey can help researchers capture the latest progress in NAR generation, inspire the design of advanced NAR models and algorithms, and enable industry practitioners to choose appropriate solutions for their applications. The web page of this survey is at \url{https://github.com/LitterBrother-Xiao/Overview-of-Non-autoregressive-Applications}.
    Active Class Selection for Few-Shot Class-Incremental Learning. (arXiv:2307.02641v1 [cs.RO])
    For real-world applications, robots will need to continually learn in their environments through limited interactions with their users. Toward this, previous works in few-shot class incremental learning (FSCIL) and active class selection (ACS) have achieved promising results but were tested in constrained setups. Therefore, in this paper, we combine ideas from FSCIL and ACS to develop a novel framework that can allow an autonomous agent to continually learn new objects by asking its users to label only a few of the most informative objects in the environment. To this end, we build on a state-of-the-art (SOTA) FSCIL model and extend it with techniques from ACS literature. We term this model Few-shot Incremental Active class SeleCtiOn (FIASco). We further integrate a potential field-based navigation technique with our model to develop a complete framework that can allow an agent to process and reason on its sensory data through the FIASco model, navigate towards the most informative object in the environment, gather data about the object through its sensors and incrementally update the FIASco model. Experimental results on a simulated agent and a real robot show the significance of our approach for long-term real-world robotics applications.
    Layer-wise Adaptive Step-Sizes for Stochastic First-Order Methods for Deep Learning. (arXiv:2305.13664v3 [cs.LG] UPDATED)
    We propose a new per-layer adaptive step-size procedure for stochastic first-order optimization methods for minimizing empirical loss functions in deep learning, eliminating the need for the user to tune the learning rate (LR). The proposed approach exploits the layer-wise stochastic curvature information contained in the diagonal blocks of the Hessian in deep neural networks (DNNs) to compute adaptive step-sizes (i.e., LRs) for each layer. The method has memory requirements that are comparable to those of first-order methods, while its per-iteration time complexity is only increased by an amount that is roughly equivalent to an additional gradient computation. Numerical experiments show that SGD with momentum and AdamW combined with the proposed per-layer step-sizes are able to choose effective LR schedules and outperform fine-tuned LR versions of these methods as well as popular first-order and second-order algorithms for training DNNs on Autoencoder, Convolutional Neural Network (CNN) and Graph Convolutional Network (GCN) models. Finally, it is proved that an idealized version of SGD with the layer-wise step sizes converges linearly when using full-batch gradients.
    Renewable energy management in smart home environment via forecast embedded scheduling based on Recurrent Trend Predictive Neural Network. (arXiv:2307.01622v2 [cs.LG] UPDATED)
    Smart home energy management systems help the distribution grid operate more efficiently and reliably, and enable effective penetration of distributed renewable energy sources. These systems rely on robust forecasting, optimization, and control/scheduling algorithms that can handle the uncertain nature of demand and renewable generation. This paper proposes an advanced ML algorithm, called Recurrent Trend Predictive Neural Network based Forecast Embedded Scheduling (rTPNN-FES), to provide efficient residential demand control. rTPNN-FES is a novel neural network architecture that simultaneously forecasts renewable energy generation and schedules household appliances. By its embedded structure, rTPNN-FES eliminates the utilization of separate algorithms for forecasting and scheduling and generates a schedule that is robust against forecasting errors. This paper also evaluates the performance of the proposed algorithm for an IoT-enabled smart home. The evaluation results reveal that rTPNN-FES provides near-optimal scheduling $37.5$ times faster than the optimization while outperforming state-of-the-art forecasting techniques.
    Improving the Efficiency of Human-in-the-Loop Systems: Adding Artificial to Human Experts. (arXiv:2307.03003v1 [cs.LG])
    Information systems increasingly leverage artificial intelligence (AI) and machine learning (ML) to generate value from vast amounts of data. However, ML models are imperfect and can generate incorrect classifications. Hence, human-in-the-loop (HITL) extensions to ML models add a human review for instances that are difficult to classify. This study argues that continuously relying on human experts to handle difficult model classifications leads to a strong increase in human effort, which strains limited resources. To address this issue, we propose a hybrid system that creates artificial experts that learn to classify data instances from unknown classes previously reviewed by human experts. Our hybrid system assesses which artificial expert is suitable for classifying an instance from an unknown class and automatically assigns it. Over time, this reduces human effort and increases the efficiency of the system. Our experiments demonstrate that our approach outperforms traditional HITL systems for several benchmarks on image classification.
    Hierarchical Empowerment: Towards Tractable Empowerment-Based Skill-Learning. (arXiv:2307.02728v1 [cs.LG])
    General purpose agents will require large repertoires of skills. Empowerment -- the maximum mutual information between skills and the states -- provides a pathway for learning large collections of distinct skills, but mutual information is difficult to optimize. We introduce a new framework, Hierarchical Empowerment, that makes computing empowerment more tractable by integrating concepts from Goal-Conditioned Hierarchical Reinforcement Learning. Our framework makes two specific contributions. First, we introduce a new variational lower bound on mutual information that can be used to compute empowerment over short horizons. Second, we introduce a hierarchical architecture for computing empowerment over exponentially longer time scales. We verify the contributions of the framework in a series of simulated robotics tasks. In a popular ant navigation domain, our four level agents are able to learn skills that cover a surface area over two orders of magnitude larger than prior work.
    Learning Disentangled Representations in Signed Directed Graphs without Social Assumptions. (arXiv:2307.03077v1 [cs.LG])
    Signed graphs are complex systems that represent trust relationships or preferences in various domains. Learning node representations in such graphs is crucial for many mining tasks. Although real-world signed relationships can be influenced by multiple latent factors, most existing methods often oversimplify the modeling of signed relationships by relying on social theories and treating them as simplistic factors. This limits their expressiveness and their ability to capture the diverse factors that shape these relationships. In this paper, we propose DINES, a novel method for learning disentangled node representations in signed directed graphs without social assumptions. We adopt a disentangled framework that separates each embedding into distinct factors, allowing for capturing multiple latent factors. We also explore lightweight graph convolutions that focus solely on sign and direction, without depending on social theories. Additionally, we propose a decoder that effectively classifies an edge's sign by considering correlations between the factors. To further enhance disentanglement, we jointly train a self-supervised factor discriminator with our encoder and decoder. Throughout extensive experiments on real-world signed directed graphs, we show that DINES effectively learns disentangled node representations, and significantly outperforms its competitors in the sign prediction task.
    Topology-Aware Loss for Aorta and Great Vessel Segmentation in Computed Tomography Images. (arXiv:2307.03137v1 [eess.IV])
    Segmentation networks are not explicitly imposed to learn global invariants of an image, such as the shape of an object and the geometry between multiple objects, when they are trained with a standard loss function. On the other hand, incorporating such invariants into network training may help improve performance for various segmentation tasks when they are the intrinsic characteristics of the objects to be segmented. One example is segmentation of aorta and great vessels in computed tomography (CT) images where vessels are found in a particular geometry in the body due to the human anatomy and they mostly seem as round objects on a 2D CT image. This paper addresses this issue by introducing a new topology-aware loss function that penalizes topology dissimilarities between the ground truth and prediction through persistent homology. Different from the previously suggested segmentation network designs, which apply the threshold filtration on a likelihood function of the prediction map and the Betti numbers of the ground truth, this paper proposes to apply the Vietoris-Rips filtration to obtain persistence diagrams of both ground truth and prediction maps and calculate the dissimilarity with the Wasserstein distance between the corresponding persistence diagrams. The use of this filtration has advantage of modeling shape and geometry at the same time, which may not happen when the threshold filtration is applied. Our experiments on 4327 CT images of 24 subjects reveal that the proposed topology-aware loss function leads to better results than its counterparts, indicating the effectiveness of this use.
    Convolutional Filtering and Neural Networks with Non Commutative Algebras. (arXiv:2108.09923v3 [cs.LG] UPDATED)
    In this paper we introduce and study the algebraic generalization of non commutative convolutional neural networks. We leverage the theory of algebraic signal processing to model convolutional non commutative architectures, and we derive concrete stability bounds that extend those obtained in the literature for commutative convolutional neural networks. We show that non commutative convolutional architectures can be stable to deformations on the space of operators. We develop the spectral representation of non commutative signal models to show that non commutative filters process Fourier components independently of each other. In particular we prove that although the spectral decompositions of signals in non commutative models are associated to eigenspaces of dimension larger than one, there exists a trade-off between stability and selectivity, which is controlled by matrix polynomial functions in spaces of matrices of low dimension. This tradeoff shows how when the filters in the algebra are restricted to be stable, there is a loss in discriminability that is compensated in the network by the pointwise nonlinearities. The results derived in this paper have direct applications and implications in non commutative convolutional architectures such as group neural networks, multigraph neural networks, and quaternion neural networks, for which we provide a set of numerical experiments showing their behavior when perturbations are present.
    PUFFIN: A Path-Unifying Feed-Forward Interfaced Network for Vapor Pressure Prediction. (arXiv:2307.02903v1 [physics.chem-ph])
    Accurately predicting vapor pressure is vital for various industrial and environmental applications. However, obtaining accurate measurements for all compounds of interest is not possible due to the resource and labor intensity of experiments. The demand for resources and labor further multiplies when a temperature-dependent relationship for predicting vapor pressure is desired. In this paper, we propose PUFFIN (Path-Unifying Feed-Forward Interfaced Network), a machine learning framework that combines transfer learning with a new inductive bias node inspired by domain knowledge (the Antoine equation) to improve vapor pressure prediction. By leveraging inductive bias and transfer learning using graph embeddings, PUFFIN outperforms alternative strategies that do not use inductive bias or that use generic descriptors of compounds. The framework's incorporation of domain-specific knowledge to overcome the limitation of poor data availability shows its potential for broader applications in chemical compound analysis, including the prediction of other physicochemical properties. Importantly, our proposed machine learning framework is partially interpretable, because the inductive Antoine node yields network-derived Antoine equation coefficients. It would then be possible to directly incorporate the obtained analytical expression in process design software for better prediction and control of processes occurring in industry and the environment.
    TGRL: An Algorithm for Teacher Guided Reinforcement Learning. (arXiv:2307.03186v1 [cs.LG])
    Learning from rewards (i.e., reinforcement learning or RL) and learning to imitate a teacher (i.e., teacher-student learning) are two established approaches for solving sequential decision-making problems. To combine the benefits of these different forms of learning, it is common to train a policy to maximize a combination of reinforcement and teacher-student learning objectives. However, without a principled method to balance these objectives, prior work used heuristics and problem-specific hyperparameter searches to balance the two objectives. We present a $\textit{principled}$ approach, along with an approximate implementation for $\textit{dynamically}$ and $\textit{automatically}$ balancing when to follow the teacher and when to use rewards. The main idea is to adjust the importance of teacher supervision by comparing the agent's performance to the counterfactual scenario of the agent learning without teacher supervision and only from rewards. If using teacher supervision improves performance, the importance of teacher supervision is increased and otherwise it is decreased. Our method, $\textit{Teacher Guided Reinforcement Learning}$ (TGRL), outperforms strong baselines across diverse domains without hyper-parameter tuning.
    Chaos Theory and Adversarial Robustness. (arXiv:2210.13235v2 [cs.LG] UPDATED)
    Neural networks, being susceptible to adversarial attacks, should face a strict level of scrutiny before being deployed in critical or adversarial applications. This paper uses ideas from Chaos Theory to explain, analyze, and quantify the degree to which neural networks are susceptible to or robust against adversarial attacks. To this end, we present a new metric, the "susceptibility ratio," given by $\hat \Psi(h, \theta)$, which captures how greatly a model's output will be changed by perturbations to a given input. Our results show that susceptibility to attack grows significantly with the depth of the model, which has safety implications for the design of neural networks for production environments. We provide experimental evidence of the relationship between $\hat \Psi$ and the post-attack accuracy of classification models, as well as a discussion of its application to tasks lacking hard decision boundaries. We also demonstrate how to quickly and easily approximate the certified robustness radii for extremely large models, which until now has been computationally infeasible to calculate directly.
    Offline Reinforcement Learning with Imbalanced Datasets. (arXiv:2307.02752v1 [cs.LG])
    The prevalent use of benchmarks in current offline reinforcement learning (RL) research has led to a neglect of the imbalance of real-world dataset distributions in the development of models. The real-world offline RL dataset is often imbalanced over the state space due to the challenge of exploration or safety considerations. In this paper, we specify properties of imbalanced datasets in offline RL, where the state coverage follows a power law distribution characterized by skewed policies. Theoretically and empirically, we show that typically offline RL methods based on distributional constraints, such as conservative Q-learning (CQL), are ineffective in extracting policies under the imbalanced dataset. Inspired by natural intelligence, we propose a novel offline RL method that utilizes the augmentation of CQL with a retrieval process to recall past related experiences, effectively alleviating the challenges posed by imbalanced datasets. We evaluate our method on several tasks in the context of imbalanced datasets with varying levels of imbalance, utilizing the variant of D4RL. Empirical results demonstrate the superiority of our method over other baselines.
    OLR-WA Online Regression with Weighted Average. (arXiv:2307.02804v1 [cs.LG])
    Machine Learning requires a large amount of training data in order to build accurate models. Sometimes the data arrives over time, requiring significant storage space and recalculating the model to account for the new data. On-line learning addresses these issues by incrementally modifying the model as data is encountered, and then discarding the data. In this study we introduce a new online linear regression approach. Our approach combines newly arriving data with a previously existing model to create a new model. The introduced model, named OLR-WA (OnLine Regression with Weighted Average) uses user-defined weights to provide flexibility in the face of changing data to bias the results in favor of old or new data. We have conducted 2-D and 3-D experiments comparing OLR-WA to a static model using the entire data set. The results show that for consistent data, OLR-WA and the static batch model perform similarly and for varying data, the user can set the OLR-WA to adapt more quickly or to resist change.
    Generation of Highlights from Research Papers Using Pointer-Generator Networks and SciBERT Embeddings. (arXiv:2302.07729v2 [cs.CL] UPDATED)
    Nowadays many research articles are prefaced with research highlights to summarize the main findings of the paper. Highlights not only help researchers precisely and quickly identify the contributions of a paper, they also enhance the discoverability of the article via search engines. We aim to automatically construct research highlights given certain segments of a research paper. We use a pointer-generator network with coverage mechanism and a contextual embedding layer at the input that encodes the input tokens into SciBERT embeddings. We test our model on a benchmark dataset, CSPubSum, and also present MixSub, a new multi-disciplinary corpus of papers for automatic research highlight generation. For both CSPubSum and MixSub, we have observed that the proposed model achieves the best performance compared to related variants and other models proposed in the literature. On the CSPubSum dataset, our model achieves the best performance when the input is only the abstract of a paper as opposed to other segments of the paper. It produces ROUGE-1, ROUGE-2 and ROUGE-L F1-scores of 38.26, 14.26 and 35.51, respectively, METEOR score of 32.62, and BERTScore F1 of 86.65 which outperform all other baselines. On the new MixSub dataset, where only the abstract is the input, our proposed model (when trained on the whole training corpus without distinguishing between the subject categories) achieves ROUGE-1, ROUGE-2 and ROUGE-L F1-scores of 31.78, 9.76 and 29.3, respectively, METEOR score of 24.00, and BERTScore F1 of 85.25.
    Free Bits: Latency Optimization of Mixed-Precision Quantized Neural Networks on the Edge. (arXiv:2307.02894v1 [cs.LG])
    Mixed-precision quantization, where a deep neural network's layers are quantized to different precisions, offers the opportunity to optimize the trade-offs between model size, latency, and statistical accuracy beyond what can be achieved with homogeneous-bit-width quantization. To navigate the intractable search space of mixed-precision configurations for a given network, this paper proposes a hybrid search methodology. It consists of a hardware-agnostic differentiable search algorithm followed by a hardware-aware heuristic optimization to find mixed-precision configurations latency-optimized for a specific hardware target. We evaluate our algorithm on MobileNetV1 and MobileNetV2 and deploy the resulting networks on a family of multi-core RISC-V microcontroller platforms with different hardware characteristics. We achieve up to 28.6% reduction of end-to-end latency compared to an 8-bit model at a negligible accuracy drop from a full-precision baseline on the 1000-class ImageNet dataset. We demonstrate speedups relative to an 8-bit baseline, even on systems with no hardware support for sub-byte arithmetic at negligible accuracy drop. Furthermore, we show the superiority of our approach with respect to differentiable search targeting reduced binary operation counts as a proxy for latency.
    Statistical-Computational Tradeoffs in Mixed Sparse Linear Regression. (arXiv:2303.02118v2 [stat.ML] UPDATED)
    We consider the problem of mixed sparse linear regression with two components, where two real $k$-sparse signals $\beta_1, \beta_2$ are to be recovered from $n$ unlabelled noisy linear measurements. The sparsity is allowed to be sublinear in the dimension, and additive noise is assumed to be independent Gaussian with variance $\sigma^2$. Prior work has shown that the problem suffers from a $\frac{k}{SNR^2}$-to-$\frac{k^2}{SNR^2}$ statistical-to-computational gap, resembling other computationally challenging high-dimensional inference problems such as Sparse PCA and Robust Sparse Mean Estimation; here $SNR$ is the signal-to-noise ratio. We establish the existence of a more extensive computational barrier for this problem through the method of low-degree polynomials, but show that the problem is computationally hard only in a very narrow symmetric parameter regime. We identify a smooth information-computation tradeoff between the sample complexity $n$ and runtime for any randomized algorithm in this hard regime. Via a simple reduction, this provides novel rigorous evidence for the existence of a computational barrier to solving exact support recovery in sparse phase retrieval with sample complexity $n = \tilde{o}(k^2)$. Our second contribution is to analyze a simple thresholding algorithm which, outside of the narrow regime where the problem is hard, solves the associated mixed regression detection problem in $O(np)$ time with square-root the number of samples and matches the sample complexity required for (non-mixed) sparse linear regression; this allows the recovery problem to be subsequently solved by state-of-the-art techniques from the dense case. As a special case of our results, we show that this simple algorithm is order-optimal among a large family of algorithms in solving exact signed support recovery in sparse linear regression.
    Time Series Clustering With Random Convolutional Kernels. (arXiv:2305.10457v2 [cs.LG] UPDATED)
    Time series data, spanning applications ranging from climatology to finance to healthcare, presents significant challenges in data mining due to its size and complexity. One open issue lies in time series clustering, which is crucial for processing large volumes of unlabeled time series data and unlocking valuable insights. Traditional and modern analysis methods, however, often struggle with these complexities. To address these limitations, we introduce R-Clustering, a novel method that utilizes convolutional architectures with randomly selected parameters. Through extensive evaluations, R-Clustering demonstrates superior performance over existing methods in terms of clustering accuracy, computational efficiency and scalability. Empirical results obtained using the UCR archive demonstrate the effectiveness of our approach across diverse time series datasets. The findings highlight the significance of R-Clustering in various domains and applications, contributing to the advancement of time series data mining.
    Exploring new ways: Enforcing representational dissimilarity to learn new features and reduce error consistency. (arXiv:2307.02516v1 [cs.LG])
    Independently trained machine learning models tend to learn similar features. Given an ensemble of independently trained models, this results in correlated predictions and common failure modes. Previous attempts focusing on decorrelation of output predictions or logits yielded mixed results, particularly due to their reduction in model accuracy caused by conflicting optimization objectives. In this paper, we propose the novel idea of utilizing methods of the representational similarity field to promote dissimilarity during training instead of measuring similarity of trained models. To this end, we promote intermediate representations to be dissimilar at different depths between architectures, with the goal of learning robust ensembles with disjoint failure modes. We show that highly dissimilar intermediate representations result in less correlated output predictions and slightly lower error consistency, resulting in higher ensemble accuracy. With this, we shine first light on the connection between intermediate representations and their impact on the output predictions.
    Multimodal Temporal Fusion Transformers Are Good Product Demand Forecasters. (arXiv:2307.02578v1 [cs.LG])
    Multimodal demand forecasting aims at predicting product demand utilizing visual, textual, and contextual information. This paper proposes a method for multimodal product demand forecasting using convolutional, graph-based, and transformer-based architectures. Traditional approaches to demand forecasting rely on historical demand, product categories, and additional contextual information such as seasonality and events. However, these approaches have several shortcomings, such as the cold start problem making it difficult to predict product demand until sufficient historical data is available for a particular product, and their inability to properly deal with category dynamics. By incorporating multimodal information, such as product images and textual descriptions, our architecture aims to address the shortcomings of traditional approaches and outperform them. The experiments conducted on a large real-world dataset show that the proposed approach effectively predicts demand for a wide range of products. The multimodal pipeline presented in this work enhances the accuracy and reliability of the predictions, demonstrating the potential of leveraging multimodal information in product demand forecasting.
    Pruning vs Quantization: Which is Better?. (arXiv:2307.02973v1 [cs.LG])
    Neural network pruning and quantization techniques are almost as old as neural networks themselves. However, to date only ad-hoc comparisons between the two have been published. In this paper, we set out to answer the question on which is better: neural network quantization or pruning? By answering this question, we hope to inform design decisions made on neural network hardware going forward. We provide an extensive comparison between the two techniques for compressing deep neural networks. First, we give an analytical comparison of expected quantization and pruning error for general data distributions. Then, we provide lower bounds for the per-layer pruning and quantization error in trained networks, and compare these to empirical error after optimization. Finally, we provide an extensive experimental comparison for training 8 large-scale models on 3 tasks. Our results show that in most cases quantization outperforms pruning. Only in some scenarios with very high compression ratio, pruning might be beneficial from an accuracy standpoint.
    Transfer Learning for the Efficient Detection of COVID-19 from Smartphone Audio Data. (arXiv:2307.02975v1 [cs.LG])
    Disease detection from smartphone data represents an open research challenge in mobile health (m-health) systems. COVID-19 and its respiratory symptoms are an important case study in this area and their early detection is a potential real instrument to counteract the pandemic situation. The efficacy of this solution mainly depends on the performances of AI algorithms applied to the collected data and their possible implementation directly on the users' mobile devices. Considering these issues, and the limited amount of available data, in this paper we present the experimental evaluation of 3 different deep learning models, compared also with hand-crafted features, and of two main approaches of transfer learning in the considered scenario: both feature extraction and fine-tuning. Specifically, we considered VGGish, YAMNET, and L\textsuperscript{3}-Net (including 12 different configurations) evaluated through user-independent experiments on 4 different datasets (13,447 samples in total). Results clearly show the advantages of L\textsuperscript{3}-Net in all the experimental settings as it overcomes the other solutions by 12.3\% in terms of Precision-Recall AUC as features extractor, and by 10\% when the model is fine-tuned. Moreover, we note that to fine-tune only the fully-connected layers of the pre-trained models generally leads to worse performances, with an average drop of 6.6\% with respect to feature extraction. %highlighting the need for further investigations. Finally, we evaluate the memory footprints of the different models for their possible applications on commercial mobile devices.
    A Recursively Recurrent Neural Network (R2N2) Architecture for Learning Iterative Algorithms. (arXiv:2211.12386v2 [cs.LG] UPDATED)
    Meta-learning of numerical algorithms for a given task consists of the data-driven identification and adaptation of an algorithmic structure and the associated hyperparameters. To limit the complexity of the meta-learning problem, neural architectures with a certain inductive bias towards favorable algorithmic structures can, and should, be used. We generalize our previously introduced Runge-Kutta neural network to a recursively recurrent neural network (R2N2) superstructure for the design of customized iterative algorithms. In contrast to off-the-shelf deep learning approaches, it features a distinct division into modules for generation of information and for the subsequent assembly of this information towards a solution. Local information in the form of a subspace is generated by subordinate, inner, iterations of recurrent function evaluations starting at the current outer iterate. The update to the next outer iterate is computed as a linear combination of these evaluations, reducing the residual in this space, and constitutes the output of the network. We demonstrate that regular training of the weight parameters inside the proposed superstructure on input/output data of various computational problem classes yields iterations similar to Krylov solvers for linear equation systems, Newton-Krylov solvers for nonlinear equation systems, and Runge-Kutta integrators for ordinary differential equations. Due to its modularity, the superstructure can be readily extended with functionalities needed to represent more general classes of iterative algorithms traditionally based on Taylor series expansions.
    Additive Decoders for Latent Variables Identification and Cartesian-Product Extrapolation. (arXiv:2307.02598v1 [cs.LG])
    We tackle the problems of latent variables identification and "out-of-support" image generation in representation learning. We show that both are possible for a class of decoders that we call additive, which are reminiscent of decoders used for object-centric representation learning (OCRL) and well suited for images that can be decomposed as a sum of object-specific images. We provide conditions under which exactly solving the reconstruction problem using an additive decoder is guaranteed to identify the blocks of latent variables up to permutation and block-wise invertible transformations. This guarantee relies only on very weak assumptions about the distribution of the latent factors, which might present statistical dependencies and have an almost arbitrarily shaped support. Our result provides a new setting where nonlinear independent component analysis (ICA) is possible and adds to our theoretical understanding of OCRL methods. We also show theoretically that additive decoders can generate novel images by recombining observed factors of variations in novel ways, an ability we refer to as Cartesian-product extrapolation. We show empirically that additivity is crucial for both identifiability and extrapolation on simulated data.
    Airfoil GAN: Encoding and Synthesizing Airfoils for Aerodynamic Shape Optimization. (arXiv:2101.04757v2 [cs.CE] UPDATED)
    The current design of aerodynamic shapes, like airfoils, involves computationally intensive simulations to explore the possible design space. Usually, such design relies on the prior definition of design parameters and places restrictions on synthesizing novel shapes. In this work, we propose a data-driven shape encoding and generating method, which automatically learns representations from existing airfoils and uses the learned representations to generate new airfoils. The representations are then used in the optimization of synthesized airfoil shapes based on their aerodynamic performance. Our model is built upon VAEGAN, a neural network that combines Variational Autoencoder with Generative Adversarial Network and is trained by the gradient-based technique. Our model can (1) encode the existing airfoil into a latent vector and reconstruct the airfoil from that, (2) generate novel airfoils by randomly sampling the latent vectors and mapping the vectors to the airfoil coordinate domain, and (3) synthesize airfoils with desired aerodynamic properties by optimizing learned features via a genetic algorithm. Our experiments show that the learned features encode shape information thoroughly and comprehensively without predefined design parameters. By interpolating/extrapolating feature vectors or sampling from Gaussian noises, the model can automatically synthesize novel airfoil shapes, some of which possess competitive or even better aerodynamic properties comparing to airfoils used for model training purposes. By optimizing shapes on the learned latent domain via a genetic algorithm, synthesized airfoils can evolve to target aerodynamic properties. This demonstrates an efficient learning-based airfoil design framework, which encodes and optimizes the airfoil on the latent domain and synthesizes promising airfoil candidates for required aerodynamic performance.
    The Impact of ChatGPT and LLMs on Medical Imaging Stakeholders: Perspectives and Use Cases. (arXiv:2306.06767v2 [eess.IV] UPDATED)
    This study investigates the transformative potential of Large Language Models (LLMs), such as OpenAI ChatGPT, in medical imaging. With the aid of public data, these models, which possess remarkable language understanding and generation capabilities, are augmenting the interpretive skills of radiologists, enhancing patient-physician communication, and streamlining clinical workflows. The paper introduces an analytic framework for presenting the complex interactions between LLMs and the broader ecosystem of medical imaging stakeholders, including businesses, insurance entities, governments, research institutions, and hospitals (nicknamed BIGR-H). Through detailed analyses, illustrative use cases, and discussions on the broader implications and future directions, this perspective seeks to raise discussion in strategic planning and decision-making in the era of AI-enabled healthcare.
    Multi-Similarity Contrastive Learning. (arXiv:2307.02712v1 [cs.LG])
    Given a similarity metric, contrastive methods learn a representation in which examples that are similar are pushed together and examples that are dissimilar are pulled apart. Contrastive learning techniques have been utilized extensively to learn representations for tasks ranging from image classification to caption generation. However, existing contrastive learning approaches can fail to generalize because they do not take into account the possibility of different similarity relations. In this paper, we propose a novel multi-similarity contrastive loss (MSCon), that learns generalizable embeddings by jointly utilizing supervision from multiple metrics of similarity. Our method automatically learns contrastive similarity weightings based on the uncertainty in the corresponding similarity, down-weighting uncertain tasks and leading to better out-of-domain generalization to new tasks. We show empirically that networks trained with MSCon outperform state-of-the-art baselines on in-domain and out-of-domain settings.
    Expert Aggregation for Financial Forecasting. (arXiv:2111.15365v4 [q-fin.ST] UPDATED)
    Machine learning algorithms dedicated to financial time series forecasting have gained a lot of interest. But choosing between several algorithms can be challenging, as their estimation accuracy may be unstable over time. Online aggregation of experts combine the forecasts of a finite set of models in a single approach without making any assumption about the models. In this paper, a Bernstein Online Aggregation (BOA) procedure is applied to the construction of long-short strategies built from individual stock return forecasts coming from different machine learning models. The online mixture of experts leads to attractive portfolio performances even in environments characterised by non-stationarity. The aggregation outperforms individual algorithms, offering a higher portfolio Sharpe Ratio, lower shortfall, with a similar turnover. Extensions to expert and aggregation specialisations are also proposed to improve the overall mixture on a family of portfolio evaluation metrics.
    Provably Efficient Iterated CVaR Reinforcement Learning with Function Approximation. (arXiv:2307.02842v1 [cs.LG])
    Risk-sensitive reinforcement learning (RL) aims to optimize policies that balance the expected reward and risk. In this paper, we investigate a novel risk-sensitive RL formulation with an Iterated Conditional Value-at-Risk (CVaR) objective under linear and general function approximations. This new formulation, named ICVaR-RL with function approximation, provides a principled way to guarantee safety at each decision step. For ICVaR-RL with linear function approximation, we propose a computationally efficient algorithm ICVaR-L, which achieves an $\widetilde{O}(\sqrt{\alpha^{-(H+1)}(d^2H^4+dH^6)K})$ regret, where $\alpha$ is the risk level, $d$ is the dimension of state-action features, $H$ is the length of each episode, and $K$ is the number of episodes. We also establish a matching lower bound $\Omega(\sqrt{\alpha^{-(H-1)}d^2K})$ to validate the optimality of ICVaR-L with respect to $d$ and $K$. For ICVaR-RL with general function approximation, we propose algorithm ICVaR-G, which achieves an $\widetilde{O}(\sqrt{\alpha^{-(H+1)}DH^4K})$ regret, where $D$ is a dimensional parameter that depends on the eluder dimension and covering number. Furthermore, our analysis provides several novel techniques for risk-sensitive RL, including an efficient approximation of the CVaR operator, a new ridge regression with CVaR-adapted features, and a refined elliptical potential lemma.
    Sample-Efficient Learning of POMDPs with Multiple Observations In Hindsight. (arXiv:2307.02884v1 [cs.LG])
    This paper studies the sample-efficiency of learning in Partially Observable Markov Decision Processes (POMDPs), a challenging problem in reinforcement learning that is known to be exponentially hard in the worst-case. Motivated by real-world settings such as loading in game playing, we propose an enhanced feedback model called ``multiple observations in hindsight'', where after each episode of interaction with the POMDP, the learner may collect multiple additional observations emitted from the encountered latent states, but may not observe the latent states themselves. We show that sample-efficient learning under this feedback model is possible for two new subclasses of POMDPs: \emph{multi-observation revealing POMDPs} and \emph{distinguishable POMDPs}. Both subclasses generalize and substantially relax \emph{revealing POMDPs} -- a widely studied subclass for which sample-efficient learning is possible under standard trajectory feedback. Notably, distinguishable POMDPs only require the emission distributions from different latent states to be \emph{different} instead of \emph{linearly independent} as required in revealing POMDPs.
    Benchmarking Test-Time Adaptation against Distribution Shifts in Image Classification. (arXiv:2307.03133v1 [cs.LG])
    Test-time adaptation (TTA) is a technique aimed at enhancing the generalization performance of models by leveraging unlabeled samples solely during prediction. Given the need for robustness in neural network systems when faced with distribution shifts, numerous TTA methods have recently been proposed. However, evaluating these methods is often done under different settings, such as varying distribution shifts, backbones, and designing scenarios, leading to a lack of consistent and fair benchmarks to validate their effectiveness. To address this issue, we present a benchmark that systematically evaluates 13 prominent TTA methods and their variants on five widely used image classification datasets: CIFAR-10-C, CIFAR-100-C, ImageNet-C, DomainNet, and Office-Home. These methods encompass a wide range of adaptation scenarios (e.g. online adaptation v.s. offline adaptation, instance adaptation v.s. batch adaptation v.s. domain adaptation). Furthermore, we explore the compatibility of different TTA methods with diverse network backbones. To implement this benchmark, we have developed a unified framework in PyTorch, which allows for consistent evaluation and comparison of the TTA methods across the different datasets and network architectures. By establishing this benchmark, we aim to provide researchers and practitioners with a reliable means of assessing and comparing the effectiveness of TTA methods in improving model robustness and generalization performance. Our code is available at https://github.com/yuyongcan/Benchmark-TTA.
    Adapting Neural Link Predictors for Complex Query Answering. (arXiv:2301.12313v2 [cs.LG] UPDATED)
    Answering complex queries on incomplete knowledge graphs is a challenging task where a model needs to answer complex logical queries in the presence of missing knowledge. Recently, Arakelyan et al. (2021); Minervini et al. (2022) showed that neural link predictors could also be used for answering complex queries: their Continuous Query Decomposition (CQD) method works by decomposing complex queries into atomic sub-queries, answers them using neural link predictors and aggregates their scores via t-norms for ranking the answers to each complex query. However, CQD does not handle negations and only uses the training signal from atomic training queries: neural link prediction scores are not calibrated to interact together via fuzzy logic t-norms during complex query answering. In this work, we propose to address this problem by training a parameter-efficient score adaptation model to re-calibrate neural link prediction scores: this new component is trained on complex queries by back-propagating through the complex query-answering process. Our method, CQD$^{A}$, produces significantly more accurate results than current state-of-the-art methods, improving from $34.4$ to $35.1$ Mean Reciprocal Rank values averaged across all datasets and query types while using $\leq 35\%$ of the available training query types. We further show that CQD$^{A}$ is data-efficient, achieving competitive results with only $1\%$ of the training data, and robust in out-of-domain evaluations.
    Active Policy Improvement from Multiple Black-box Oracles. (arXiv:2306.10259v2 [cs.LG] UPDATED)
    Reinforcement learning (RL) has made significant strides in various complex domains. However, identifying an effective policy via RL often necessitates extensive exploration. Imitation learning aims to mitigate this issue by using expert demonstrations to guide exploration. In real-world scenarios, one often has access to multiple suboptimal black-box experts, rather than a single optimal oracle. These experts do not universally outperform each other across all states, presenting a challenge in actively deciding which oracle to use and in which state. We introduce MAPS and MAPS-SE, a class of policy improvement algorithms that perform imitation learning from multiple suboptimal oracles. In particular, MAPS actively selects which of the oracles to imitate and improve their value function estimates, and MAPS-SE additionally leverages an active state exploration criterion to determine which states one should explore. We provide a comprehensive theoretical analysis and demonstrate that MAPS and MAPS-SE enjoy sample efficiency advantage over the state-of-the-art policy improvement algorithms. Empirical results show that MAPS-SE significantly accelerates policy optimization via state-wise imitation learning from multiple oracles across a broad spectrum of control tasks in the DeepMind Control Suite. Our code is publicly available at: https://github.com/ripl/maps.
    How accurate are existing land cover maps for agriculture in Sub-Saharan Africa?. (arXiv:2307.02575v1 [cs.LG])
    Satellite Earth observations (EO) can provide affordable and timely information for assessing crop conditions and food production. Such monitoring systems are essential in Africa, where there is high food insecurity and sparse agricultural statistics. EO-based monitoring systems require accurate cropland maps to provide information about croplands, but there is a lack of data to determine which of the many available land cover maps most accurately identify cropland in African countries. This study provides a quantitative evaluation and intercomparison of 11 publicly available land cover maps to assess their suitability for cropland classification and EO-based agriculture monitoring in Africa using statistically rigorous reference datasets from 8 countries. We hope the results of this study will help users determine the most suitable map for their needs and encourage future work to focus on resolving inconsistencies between maps and improving accuracy in low-accuracy regions.
    Few-Shot Personalized Saliency Prediction Using Tensor Regression for Preserving Structural Global Information. (arXiv:2307.02799v1 [eess.IV])
    This paper presents a few-shot personalized saliency prediction using tensor-to-matrix regression for preserving the structural global information of personalized saliency maps (PSMs). In contrast to a general saliency map, a PSM has been great potential since its map indicates the person-specific visual attention that is useful for obtaining individual visual preferences from heterogeneity of gazed areas. The PSM prediction is needed for acquiring the PSM for the unseen image, but its prediction is still a challenging task due to the complexity of individual gaze patterns. For recognizing individual gaze patterns from the limited amount of eye-tracking data, the previous methods adopt the similarity of gaze tendency between persons. However, in the previous methods, the PSMs are vectorized for the prediction model. In this way, the structural global information of the PSMs corresponding to the image is ignored. For automatically revealing the relationship between PSMs, we focus on the tensor-based regression model that can preserve the structural information of PSMs, and realize the improvement of the prediction accuracy. In the experimental results, we confirm the proposed method including the tensor-based regression outperforms the comparative methods.
    Duality in Multi-View Restricted Kernel Machines. (arXiv:2305.17251v2 [cs.LG] UPDATED)
    We propose a unifying setting that combines existing restricted kernel machine methods into a single primal-dual multi-view framework for kernel principal component analysis in both supervised and unsupervised settings. We derive the primal and dual representations of the framework and relate different training and inference algorithms from a theoretical perspective. We show how to achieve full equivalence in primal and dual formulations by rescaling primal variables. Finally, we experimentally validate the equivalence and provide insight into the relationships between different methods on a number of time series data sets by recursively forecasting unseen test data and visualizing the learned features.
    Multi-gauge Hydrological Variational Data Assimilation: Regionalization Learning with Spatial Gradients using Multilayer Perceptron and Bayesian-Guided Multivariate Regression. (arXiv:2307.02497v1 [cs.LG])
    Tackling the difficult problem of estimating spatially distributed hydrological parameters, especially for floods on ungauged watercourses, this contribution presents a novel seamless regionalization technique for learning complex regional transfer functions designed for high-resolution hydrological models. The transfer functions rely on: (i) a multilayer perceptron enabling a seamless flow of gradient computation to employ machine learning optimization algorithms, or (ii) a multivariate regression mapping optimized by variational data assimilation algorithms and guided by Bayesian estimation, addressing the equifinality issue of feasible solutions. The approach involves incorporating the inferable regionalization mappings into a differentiable hydrological model and optimizing a cost function computed on multi-gauge data with accurate adjoint-based spatially distributed gradients.
    Balanced Filtering via Non-Disclosive Proxies. (arXiv:2306.15083v2 [cs.LG] UPDATED)
    We study the problem of non-disclosively collecting a sample of data that is balanced with respect to sensitive groups when group membership is unavailable or prohibited from use at collection time. Specifically, our collection mechanism does not reveal significantly more about group membership of any individual sample than can be ascertained from base rates alone. To do this, we adopt a fairness pipeline perspective, in which a learner can use a small set of labeled data to train a proxy function that can later be used for this filtering task. We then associate the range of the proxy function with sampling probabilities; given a new candidate, we classify it using our proxy function, and then select it for our sample with probability proportional to the sampling probability corresponding to its proxy classification. Importantly, we require that the proxy classification itself not reveal significant information about the sensitive group membership of any individual sample (i.e., it should be sufficiently non-disclosive). We show that under modest algorithmic assumptions, we find such a proxy in a sample- and oracle-efficient manner. Finally, we experimentally evaluate our algorithm and analyze generalization properties.
    Stability of Q-Learning Through Design and Optimism. (arXiv:2307.02632v1 [cs.LG])
    Q-learning has become an important part of the reinforcement learning toolkit since its introduction in the dissertation of Chris Watkins in the 1980s. The purpose of this paper is in part a tutorial on stochastic approximation and Q-learning, providing details regarding the INFORMS APS inaugural Applied Probability Trust Plenary Lecture, presented in Nancy France, June 2023. The paper also presents new approaches to ensure stability and potentially accelerated convergence for these algorithms, and stochastic approximation in other settings. Two contributions are entirely new: 1. Stability of Q-learning with linear function approximation has been an open topic for research for over three decades. It is shown that with appropriate optimistic training in the form of a modified Gibbs policy, there exists a solution to the projected Bellman equation, and the algorithm is stable (in terms of bounded parameter estimates). Convergence remains one of many open topics for research. 2. The new Zap Zero algorithm is designed to approximate the Newton-Raphson flow without matrix inversion. It is stable and convergent under mild assumptions on the mean flow vector field for the algorithm, and compatible statistical assumption on an underlying Markov chain. The algorithm is a general approach to stochastic approximation which in particular applies to Q-learning with "oblivious" training even with non-linear function approximation.
    Origin-Destination Travel Time Oracle for Map-based Services. (arXiv:2307.03048v1 [cs.LG])
    Given an origin (O), a destination (D), and a departure time (T), an Origin-Destination (OD) travel time oracle~(ODT-Oracle) returns an estimate of the time it takes to travel from O to D when departing at T. ODT-Oracles serve important purposes in map-based services. To enable the construction of such oracles, we provide a travel-time estimation (TTE) solution that leverages historical trajectories to estimate time-varying travel times for OD pairs. The problem is complicated by the fact that multiple historical trajectories with different travel times may connect an OD pair, while trajectories may vary from one another. To solve the problem, it is crucial to remove outlier trajectories when doing travel time estimation for future queries. We propose a novel, two-stage framework called Diffusion-based Origin-destination Travel Time Estimation (DOT), that solves the problem. First, DOT employs a conditioned Pixelated Trajectories (PiT) denoiser that enables building a diffusion-based PiT inference process by learning correlations between OD pairs and historical trajectories. Specifically, given an OD pair and a departure time, we aim to infer a PiT. Next, DOT encompasses a Masked Vision Transformer~(MViT) that effectively and efficiently estimates a travel time based on the inferred PiT. We report on extensive experiments on two real-world datasets that offer evidence that DOT is capable of outperforming baseline methods in terms of accuracy, scalability, and explainability.
    Learning Curves for Heterogeneous Feature-Subsampled Ridge Ensembles. (arXiv:2307.03176v1 [stat.ML])
    Feature bagging is a well-established ensembling method which aims to reduce prediction variance by training estimators in an ensemble on random subsamples or projections of features. Typically, ensembles are chosen to be homogeneous, in the sense the the number of feature dimensions available to an estimator is uniform across the ensemble. Here, we introduce heterogeneous feature ensembling, with estimators built on varying number of feature dimensions, and consider its performance in a linear regression setting. We study an ensemble of linear predictors, each fit using ridge regression on a subset of the available features. We allow the number of features included in these subsets to vary. Using the replica trick from statistical physics, we derive learning curves for ridge ensembles with deterministic linear masks. We obtain explicit expressions for the learning curves in the case of equicorrelated data with an isotropic feature noise. Using the derived expressions, we investigate the effect of subsampling and ensembling, finding sharp transitions in the optimal ensembling strategy in the parameter space of noise level, data correlations, and data-task alignment. Finally, we suggest variable-dimension feature bagging as a strategy to mitigate double descent for robust machine learning in practice.
    Wasserstein Auto-Encoders of Merge Trees (and Persistence Diagrams). (arXiv:2307.02509v1 [cs.LG])
    This paper presents a computational framework for the Wasserstein auto-encoding of merge trees (MT-WAE), a novel extension of the classical auto-encoder neural network architecture to the Wasserstein metric space of merge trees. In contrast to traditional auto-encoders which operate on vectorized data, our formulation explicitly manipulates merge trees on their associated metric space at each layer of the network, resulting in superior accuracy and interpretability. Our novel neural network approach can be interpreted as a non-linear generalization of previous linear attempts [65] at merge tree encoding. It also trivially extends to persistence diagrams. Extensive experiments on public ensembles demonstrate the efficiency of our algorithms, with MT-WAE computations in the orders of minutes on average. We show the utility of our contributions in two applications adapted from previous work on merge tree encoding [65]. First, we apply MT-WAE to data reduction and reliably compress merge trees by concisely representing them with their coordinates in the final layer of our auto-encoder. Second, we document an application to dimensionality reduction, by exploiting the latent space of our auto-encoder, for the visual analysis of ensemble data. We illustrate the versatility of our framework by introducing two penalty terms, to help preserve in the latent space both the Wasserstein distances between merge trees, as well as their clusters. In both applications, quantitative experiments assess the relevance of our framework. Finally, we provide a C++ implementation that can be used for reproducibility.
    Temporal Difference Learning for High-Dimensional PIDEs with Jumps. (arXiv:2307.02766v1 [math.NA])
    In this paper, we propose a deep learning framework for solving high-dimensional partial integro-differential equations (PIDEs) based on the temporal difference learning. We introduce a set of Levy processes and construct a corresponding reinforcement learning model. To simulate the entire process, we use deep neural networks to represent the solutions and non-local terms of the equations. Subsequently, we train the networks using the temporal difference error, termination condition, and properties of the non-local terms as the loss function. The relative error of the method reaches O(10^{-3}) in 100-dimensional experiments and O(10^{-4}) in one-dimensional pure jump problems. Additionally, our method demonstrates the advantages of low computational cost and robustness, making it well-suited for addressing problems with different forms and intensities of jumps.
    A Privacy-Preserving Walk in the Latent Space of Generative Models for Medical Applications. (arXiv:2307.02984v1 [cs.LG])
    Generative Adversarial Networks (GANs) have demonstrated their ability to generate synthetic samples that match a target distribution. However, from a privacy perspective, using GANs as a proxy for data sharing is not a safe solution, as they tend to embed near-duplicates of real samples in the latent space. Recent works, inspired by k-anonymity principles, address this issue through sample aggregation in the latent space, with the drawback of reducing the dataset by a factor of k. Our work aims to mitigate this problem by proposing a latent space navigation strategy able to generate diverse synthetic samples that may support effective training of deep models, while addressing privacy concerns in a principled way. Our approach leverages an auxiliary identity classifier as a guide to non-linearly walk between points in the latent space, minimizing the risk of collision with near-duplicates of real samples. We empirically demonstrate that, given any random pair of points in the latent space, our walking strategy is safer than linear interpolation. We then test our path-finding strategy combined to k-same methods and demonstrate, on two benchmarks for tuberculosis and diabetic retinopathy classification, that training a model using samples generated by our approach mitigate drops in performance, while keeping privacy preservation.
    Diffusion Models for Computational Design at the Example of Floor Plans. (arXiv:2307.02511v1 [cs.LG])
    AI Image generators based on diffusion models are widely discussed recently for their capability to create images from simple text prompts. But, for practical use in civil engineering they need to be able to create specific construction plans for given constraints. Within this paper we explore the capabilities of those diffusion-based AI generators for computational design at the example of floor plans and identify their current limitation. We explain how the diffusion-models work and propose new diffusion models with improved semantic encoding. In several experiments we show that we can improve validity of generated floor plans from 6% to 90% and query performance for different examples. We identify short comings and derive future research challenges of those models and discuss the need to combine diffusion models with building information modelling. With this we provide key insights into the current state and future directions for diffusion models in civil engineering.
    Quantum Solutions to the Privacy vs. Utility Tradeoff. (arXiv:2307.03118v1 [quant-ph])
    In this work, we propose a novel architecture (and several variants thereof) based on quantum cryptographic primitives with provable privacy and security guarantees regarding membership inference attacks on generative models. Our architecture can be used on top of any existing classical or quantum generative models. We argue that the use of quantum gates associated with unitary operators provides inherent advantages compared to standard Differential Privacy based techniques for establishing guaranteed security from all polynomial-time adversaries.
    REAL: A Representative Error-Driven Approach for Active Learning. (arXiv:2307.00968v2 [cs.LG] UPDATED)
    Given a limited labeling budget, active learning (AL) aims to sample the most informative instances from an unlabeled pool to acquire labels for subsequent model training. To achieve this, AL typically measures the informativeness of unlabeled instances based on uncertainty and diversity. However, it does not consider erroneous instances with their neighborhood error density, which have great potential to improve the model performance. To address this limitation, we propose $REAL$, a novel approach to select data instances with $\underline{R}$epresentative $\underline{E}$rrors for $\underline{A}$ctive $\underline{L}$earning. It identifies minority predictions as \emph{pseudo errors} within a cluster and allocates an adaptive sampling budget for the cluster based on estimated error density. Extensive experiments on five text classification datasets demonstrate that $REAL$ consistently outperforms all best-performing baselines regarding accuracy and F1-macro scores across a wide range of hyperparameter settings. Our analysis also shows that $REAL$ selects the most representative pseudo errors that match the distribution of ground-truth errors along the decision boundary. Our code is publicly available at https://github.com/withchencheng/ECML_PKDD_23_Real.
    GIT: Detecting Uncertainty, Out-Of-Distribution and Adversarial Samples using Gradients and Invariance Transformations. (arXiv:2307.02672v1 [cs.LG])
    Deep neural networks tend to make overconfident predictions and often require additional detectors for misclassifications, particularly for safety-critical applications. Existing detection methods usually only focus on adversarial attacks or out-of-distribution samples as reasons for false predictions. However, generalization errors occur due to diverse reasons often related to poorly learning relevant invariances. We therefore propose GIT, a holistic approach for the detection of generalization errors that combines the usage of gradient information and invariance transformations. The invariance transformations are designed to shift misclassified samples back into the generalization area of the neural network, while the gradient information measures the contradiction between the initial prediction and the corresponding inherent computations of the neural network using the transformed sample. Our experiments demonstrate the superior performance of GIT compared to the state-of-the-art on a variety of network architectures, problem setups and perturbation types.
    A Finite-Particle Convergence Rate for Stein Variational Gradient Descent. (arXiv:2211.09721v4 [cs.LG] UPDATED)
    We provide the first finite-particle convergence rate for Stein variational gradient descent (SVGD), a popular algorithm for approximating a probability distribution with a collection of particles. Specifically, whenever the target distribution is sub-Gaussian with a Lipschitz score, SVGD with n particles and an appropriate step size sequence drives the kernel Stein discrepancy to zero at an order 1/sqrt(log log n) rate. We suspect that the dependence on n can be improved, and we hope that our explicit, non-asymptotic proof strategy will serve as a template for future refinements.
    Parameter-Efficient Fine-Tuning of LLaMA for the Clinical Domain. (arXiv:2307.03042v1 [cs.CL])
    Adapting pretrained language models to novel domains, such as clinical applications, traditionally involves retraining their entire set of parameters. However, this approach is increasingly proven to be impractical owing to the substantial computational requirements associated with training such large language models. To address this issue, Parameter-Efficient Fine-Tuning (PEFT) techniques offer a viable solution by selectively fine-tuning a small subset of additional parameters, significantly reducing the computational requirements for domain adaptation. In this study, we propose Clinical LLaMA-LoRA, a PEFT adapter layer built upon the open-sourced LLaMA model. Clinical LLaMA-LoRA is trained using clinical notes obtained from the MIMIC-IV database, thereby creating a specialised adapter designed for the clinical domain. Additionally, we propose a two-step PEFT framework which fuses Clinical LLaMA-LoRA with Downstream LLaMA-LoRA, another PEFT adapter specialised for downstream tasks. We evaluate this framework on multiple clinical outcome prediction datasets, comparing it to clinically trained language models. Our proposed framework achieves a state-of-the-art AUROC score averaged across all clinical downstream tasks. We observe substantial improvements of 6-9% AUROC score in the large-scale multilabel classification tasks, such as diagnoses and procedures classification.
    Generalizing Backpropagation for Gradient-Based Interpretability. (arXiv:2307.03056v1 [cs.LG])
    Many popular feature-attribution methods for interpreting deep neural networks rely on computing the gradients of a model's output with respect to its inputs. While these methods can indicate which input features may be important for the model's prediction, they reveal little about the inner workings of the model itself. In this paper, we observe that the gradient computation of a model is a special case of a more general formulation using semirings. This observation allows us to generalize the backpropagation algorithm to efficiently compute other interpretable statistics about the gradient graph of a neural network, such as the highest-weighted path and entropy. We implement this generalized algorithm, evaluate it on synthetic datasets to better understand the statistics it computes, and apply it to study BERT's behavior on the subject-verb number agreement task (SVA). With this method, we (a) validate that the amount of gradient flow through a component of a model reflects its importance to a prediction and (b) for SVA, identify which pathways of the self-attention mechanism are most important.
    CPDG: A Contrastive Pre-Training Method for Dynamic Graph Neural Networks. (arXiv:2307.02813v1 [cs.LG])
    Dynamic graph data mining has gained popularity in recent years due to the rich information contained in dynamic graphs and their widespread use in the real world. Despite the advances in dynamic graph neural networks (DGNNs), the rich information and diverse downstream tasks have posed significant difficulties for the practical application of DGNNs in industrial scenarios. To this end, in this paper, we propose to address them by pre-training and present the Contrastive Pre-Training Method for Dynamic Graph Neural Networks (CPDG). CPDG tackles the challenges of pre-training for DGNNs, including generalization and long-short term modeling capability, through a flexible structural-temporal subgraph sampler along with structural-temporal contrastive pre-training schemes. Extensive experiments conducted on both large-scale research and industrial dynamic graph datasets show that CPDG outperforms existing methods in dynamic graph pre-training for various downstream tasks under three transfer settings.
    Convex-Concave Min-Max Stackelberg Games. (arXiv:2110.05192v8 [cs.GT] UPDATED)
    Min-max optimization problems (i.e., min-max games) have been attracting a great deal of attention because of their applicability to a wide range of machine learning problems. Although significant progress has been made recently, the literature to date has focused on games with independent strategy sets; little is known about solving games with dependent strategy sets, which can be characterized as min-max Stackelberg games. We introduce two first-order methods that solve a large class of convex-concave min-max Stackelberg games, and show that our methods converge in polynomial time. Min-max Stackelberg games were first studied by Wald, under the posthumous name of Wald's maximin model, a variant of which is the main paradigm used in robust optimization, which means that our methods can likewise solve many convex robust optimization problems. We observe that the computation of competitive equilibria in Fisher markets also comprises a min-max Stackelberg game. Further, we demonstrate the efficacy and efficiency of our algorithms in practice by computing competitive equilibria in Fisher markets with varying utility structures. Our experiments suggest potential ways to extend our theoretical results, by demonstrating how different smoothness properties can affect the convergence rate of our algorithms.
    Modeling Content Creator Incentives on Algorithm-Curated Platforms. (arXiv:2206.13102v2 [cs.GT] UPDATED)
    Content creators compete for user attention. Their reach crucially depends on algorithmic choices made by developers on online platforms. To maximize exposure, many creators adapt strategically, as evidenced by examples like the sprawling search engine optimization industry. This begets competition for the finite user attention pool. We formalize these dynamics in what we call an exposure game, a model of incentives induced by algorithms, including modern factorization and (deep) two-tower architectures. We prove that seemingly innocuous algorithmic choices, e.g., non-negative vs. unconstrained factorization, significantly affect the existence and character of (Nash) equilibria in exposure games. We proffer use of creator behavior models, like exposure games, for an (ex-ante) pre-deployment audit. Such an audit can identify misalignment between desirable and incentivized content, and thus complement post-hoc measures like content filtering and moderation. To this end, we propose tools for numerically finding equilibria in exposure games, and illustrate results of an audit on the MovieLens and LastFM datasets. Among else, we find that the strategically produced content exhibits strong dependence between algorithmic exploration and content diversity, and between model expressivity and bias towards gender-based user and creator groups.
    Distilling Large Vision-Language Model with Out-of-Distribution Generalizability. (arXiv:2307.03135v1 [cs.CV])
    Large vision-language models have achieved outstanding performance, but their size and computational requirements make their deployment on resource-constrained devices and time-sensitive tasks impractical. Model distillation, the process of creating smaller, faster models that maintain the performance of larger models, is a promising direction towards the solution. This paper investigates the distillation of visual representations in large teacher vision-language models into lightweight student models using a small- or mid-scale dataset. Notably, this study focuses on open-vocabulary out-of-distribution (OOD) generalization, a challenging problem that has been overlooked in previous model distillation literature. We propose two principles from vision and language modality perspectives to enhance student's OOD generalization: (1) by better imitating teacher's visual representation space, and carefully promoting better coherence in vision-language alignment with the teacher; (2) by enriching the teacher's language representations with informative and finegrained semantic attributes to effectively distinguish between different labels. We propose several metrics and conduct extensive experiments to investigate their techniques. The results demonstrate significant improvements in zero-shot and few-shot student performance on open-vocabulary out-of-distribution classification, highlighting the effectiveness of our proposed approaches. Our code will be released at https://github.com/xuanlinli17/large_vlm_distillation_ood
    Evaluating the Evaluators: Are Current Few-Shot Learning Benchmarks Fit for Purpose?. (arXiv:2307.02732v1 [cs.LG])
    Numerous benchmarks for Few-Shot Learning have been proposed in the last decade. However all of these benchmarks focus on performance averaged over many tasks, and the question of how to reliably evaluate and tune models trained for individual tasks in this regime has not been addressed. This paper presents the first investigation into task-level evaluation -- a fundamental step when deploying a model. We measure the accuracy of performance estimators in the few-shot setting, consider strategies for model selection, and examine the reasons for the failure of evaluators usually thought of as being robust. We conclude that cross-validation with a low number of folds is the best choice for directly estimating the performance of a model, whereas using bootstrapping or cross validation with a large number of folds is better for model selection purposes. Overall, we find that existing benchmarks for few-shot learning are not designed in such a way that one can get a reliable picture of how effectively methods can be used on individual tasks.
    Fairness in Multi-Task Learning via Wasserstein Barycenters. (arXiv:2306.10155v2 [stat.ML] UPDATED)
    Algorithmic Fairness is an established field in machine learning that aims to reduce biases in data. Recent advances have proposed various methods to ensure fairness in a univariate environment, where the goal is to de-bias a single task. However, extending fairness to a multi-task setting, where more than one objective is optimised using a shared representation, remains underexplored. To bridge this gap, we develop a method that extends the definition of Strong Demographic Parity to multi-task learning using multi-marginal Wasserstein barycenters. Our approach provides a closed form solution for the optimal fair multi-task predictor including both regression and binary classification tasks. We develop a data-driven estimation procedure for the solution and run numerical experiments on both synthetic and real datasets. The empirical results highlight the practical value of our post-processing methodology in promoting fair decision-making.
    Dynamical Isometry based Rigorous Fair Neural Architecture Search. (arXiv:2307.02263v2 [cs.LG] UPDATED)
    Recently, the weight-sharing technique has significantly speeded up the training and evaluation procedure of neural architecture search. However, most existing weight-sharing strategies are solely based on experience or observation, which makes the searching results lack interpretability and rationality. In addition, due to the negligence of fairness, current methods are prone to make misjudgments in module evaluation. To address these problems, we propose a novel neural architecture search algorithm based on dynamical isometry. We use the fix point analysis method in the mean field theory to analyze the dynamics behavior in the steady state random neural network, and how dynamic isometry guarantees the fairness of weight-sharing based NAS. Meanwhile, we prove that our module selection strategy is rigorous fair by estimating the generalization error of all modules with well-conditioned Jacobian. Extensive experiments show that, with the same size, the architecture searched by the proposed method can achieve state-of-the-art top-1 validation accuracy on ImageNet classification. In addition, we demonstrate that our method is able to achieve better and more stable training performance without loss of generality.
    Understanding Uncertainty Sampling. (arXiv:2307.02719v1 [cs.LG])
    Uncertainty sampling is a prevalent active learning algorithm that queries sequentially the annotations of data samples which the current prediction model is uncertain about. However, the usage of uncertainty sampling has been largely heuristic: (i) There is no consensus on the proper definition of "uncertainty" for a specific task under a specific loss; (ii) There is no theoretical guarantee that prescribes a standard protocol to implement the algorithm, for example, how to handle the sequentially arrived annotated data under the framework of optimization algorithms such as stochastic gradient descent. In this work, we systematically examine uncertainty sampling algorithms under both stream-based and pool-based active learning. We propose a notion of equivalent loss which depends on the used uncertainty measure and the original loss function and establish that an uncertainty sampling algorithm essentially optimizes against such an equivalent loss. The perspective verifies the properness of existing uncertainty measures from two aspects: surrogate property and loss convexity. Furthermore, we propose a new notion for designing uncertainty measures called \textit{loss as uncertainty}. The idea is to use the conditional expected loss given the features as the uncertainty measure. Such an uncertainty measure has nice analytical properties and generality to cover both classification and regression problems, which enable us to provide the first generalization bound for uncertainty sampling algorithms under both stream-based and pool-based settings, in the full generality of the underlying model and problem. Lastly, we establish connections between certain variants of the uncertainty sampling algorithms with risk-sensitive objectives and distributional robustness, which can partly explain the advantage of uncertainty sampling algorithms when the sample size is small.
    VerifAI: Verified Generative AI. (arXiv:2307.02796v1 [cs.DB])
    Generative AI has made significant strides, yet concerns about the accuracy and reliability of its outputs continue to grow. Such inaccuracies can have serious consequences such as inaccurate decision-making, the spread of false information, privacy violations, legal liabilities, and more. Although efforts to address these risks are underway, including explainable AI and responsible AI practices such as transparency, privacy protection, bias mitigation, and social and environmental responsibility, misinformation caused by generative AI will remain a significant challenge. We propose that verifying the outputs of generative AI from a data management perspective is an emerging issue for generative AI. This involves analyzing the underlying data from multi-modal data lakes, including text files, tables, and knowledge graphs, and assessing its quality and consistency. By doing so, we can establish a stronger foundation for evaluating the outputs of generative AI models. Such an approach can ensure the correctness of generative AI, promote transparency, and enable decision-making with greater confidence. Our vision is to promote the development of verifiable generative AI and contribute to a more trustworthy and responsible use of AI.
    Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization. (arXiv:2303.03108v3 [cs.LG] UPDATED)
    Recently, flat minima are proven to be effective for improving generalization and sharpness-aware minimization (SAM) achieves state-of-the-art performance. Yet the current definition of flatness discussed in SAM and its follow-ups are limited to the zeroth-order flatness (i.e., the worst-case loss within a perturbation radius). We show that the zeroth-order flatness can be insufficient to discriminate minima with low generalization error from those with high generalization error both when there is a single minimum or multiple minima within the given perturbation radius. Thus we present first-order flatness, a stronger measure of flatness focusing on the maximal gradient norm within a perturbation radius which bounds both the maximal eigenvalue of Hessian at local minima and the regularization function of SAM. We also present a novel training procedure named Gradient norm Aware Minimization (GAM) to seek minima with uniformly small curvature across all directions. Experimental results show that GAM improves the generalization of models trained with current optimizers such as SGD and AdamW on various datasets and networks. Furthermore, we show that GAM can help SAM find flatter minima and achieve better generalization.
    Efficient and Feasible Robotic Assembly Sequence Planning via Graph Representation Learning. (arXiv:2303.10135v3 [cs.RO] UPDATED)
    Automatic Robotic Assembly Sequence Planning (RASP) can significantly improve productivity and resilience in modern manufacturing along with the growing need for greater product customization. One of the main challenges in realizing such automation resides in efficiently finding solutions from a growing number of potential sequences for increasingly complex assemblies. Besides, costly feasibility checks are always required for the robotic system. To address this, we propose a holistic graphical approach including a graph representation called Assembly Graph for product assemblies and a policy architecture, Graph Assembly Processing Network, dubbed GRACE for assembly sequence generation. Secondly, we use GRACE to extract meaningful information from the graph input and predict assembly sequences in a step-by-step manner. In experiments, we show that our approach can predict feasible assembly sequences across product variants of aluminum profiles based on data collected in simulation of a dual-armed robotic system. We further demonstrate that our method is capable of detecting infeasible assemblies, substantially alleviating the undesirable impacts from false predictions, and hence facilitating real-world deployment soon. Code and training data are available at https://github.com/DLR-RM/GRACE.
    SLPerf: a Unified Framework for Benchmarking Split Learning. (arXiv:2304.01502v2 [cs.LG] UPDATED)
    Data privacy concerns has made centralized training of data, which is scattered across silos, infeasible, leading to the need for collaborative learning frameworks. To address that, two prominent frameworks emerged, i.e., federated learning (FL) and split learning (SL). While FL has established various benchmark frameworks and research libraries,SL currently lacks a unified library despite its diversity in terms of label sharing, model aggregation, and cut layer choice. This lack of standardization makes comparing SL paradigms difficult. To address this, we propose SLPerf, a unified research framework and open research library for SL, and conduct extensive experiments on four widely-used datasets under both IID and Non-IID data settings. Our contributions include a comprehensive survey of recently proposed SL paradigms, a detailed benchmark comparison of different SL paradigms in different situations, and rich engineering take-away messages and research insights for improving SL paradigms. SLPerf can facilitate SL algorithm development and fair performance comparisons. The code is available at https://github.com/Rainysponge/Split-learning-Attacks .
    PCL-Indexability and Whittle Index for Restless Bandits with General Observation Models. (arXiv:2307.03034v1 [stat.ML])
    In this paper, we consider a general observation model for restless multi-armed bandit problems. The operation of the player needs to be based on certain feedback mechanism that is error-prone due to resource constraints or environmental or intrinsic noises. By establishing a general probabilistic model for dynamics of feedback/observation, we formulate the problem as a restless bandit with a countable belief state space starting from an arbitrary initial belief (a priori information). We apply the achievable region method with partial conservation law (PCL) to the infinite-state problem and analyze its indexability and priority index (Whittle index). Finally, we propose an approximation process to transform the problem into which the AG algorithm of Ni\~no-Mora and Bertsimas for finite-state problems can be applied to. Numerical experiments show that our algorithm has an excellent performance.
    Memory-efficient NLLB-200: Language-specific Expert Pruning of a Massively Multilingual Machine Translation Model. (arXiv:2212.09811v2 [cs.CL] UPDATED)
    Compared to conventional bilingual translation systems, massively multilingual machine translation is appealing because a single model can translate into multiple languages and benefit from knowledge transfer for low resource languages. On the other hand, massively multilingual models suffer from the curse of multilinguality, unless scaling their size massively, which increases their training and inference costs. Sparse Mixture-of-Experts models are a way to drastically increase model capacity without the need for a proportional amount of computing. The recently released NLLB-200 is an example of such a model. It covers 202 languages but requires at least four 32GB GPUs just for inference. In this work, we propose a pruning method that allows the removal of up to 80\% of experts with a negligible loss in translation quality, which makes it feasible to run the model on a single 32GB GPU. Further analysis suggests that our pruning metrics allow to identify language-specific experts and prune non-relevant experts for a given language pair.
    Intercomparison of Brown Dwarf Model Grids and Atmospheric Retrieval Using Machine Learning. (arXiv:2305.07719v2 [astro-ph.SR] UPDATED)
    Understanding differences between sub-stellar spectral data and models has proven to be a major challenge, especially for self-consistent model grids that are necessary for a thorough investigation of brown dwarf atmospheres. Using the supervised machine learning method of the random forest, we study the information content of 14 previously published model grids of brown dwarfs (from 1997 to 2021). The random forest method allows us to analyze the predictive power of these model grids, as well as interpret data within the framework of Approximate Bayesian Computation (ABC). Our curated dataset includes 3 benchmark brown dwarfs (Gl 570D, {\epsilon} Indi Ba and Bb) as well as a sample of 19 L and T dwarfs; this sample was previously analyzed in Lueber et al. (2022) using traditional Bayesian methods (nested sampling). We find that the effective temperature of a brown dwarf can be robustly predicted independent of the model grid chosen for the interpretation. However, inference of the surface gravity is model-dependent. Specifically, the BT-Settl, Sonora Bobcat and Sonora Cholla model grids tend to predict logg ~3-4 (cgs units) even after data blueward of 1.2 {\mu}m have been disregarded to mitigate for our incomplete knowledge of the shapes of alkali lines. Two major, longstanding challenges associated with understanding the influence of clouds in brown dwarf atmospheres remain: our inability to model them from first principles and also to robustly validate these models.
    When No-Rejection Learning is Optimal for Regression with Rejection. (arXiv:2307.02932v1 [cs.LG])
    Learning with rejection is a prototypical model for studying the interaction between humans and AI on prediction tasks. The model has two components, a predictor and a rejector. Upon the arrival of a sample, the rejector first decides whether to accept it; if accepted, the predictor fulfills the prediction task, and if rejected, the prediction will be deferred to humans. The learning problem requires learning a predictor and a rejector simultaneously. This changes the structure of the conventional loss function and often results in non-convexity and inconsistency issues. For the classification with rejection problem, several works develop surrogate losses for the jointly learning with provable consistency guarantees; in parallel, there has been less work for the regression counterpart. We study the regression with rejection (RwR) problem and investigate the no-rejection learning strategy which treats the RwR problem as a standard regression task to learn the predictor. We establish that the suboptimality of the no-rejection learning strategy observed in the literature can be mitigated by enlarging the function class of the predictor. Then we introduce the truncated loss to single out the learning for the predictor and we show that a consistent surrogate property can be established for the predictor individually in an easier way than for the predictor and the rejector jointly. Our findings advocate for a two-step learning procedure that first uses all the data to learn the predictor and then calibrates the prediction loss for the rejector. It is better aligned with the common intuition that more data samples will lead to a better predictor and it calls for more efforts on a better design of calibration algorithms for learning the rejector. While our discussions mainly focus on the regression problem, the theoretical results and insights generalize to the classification problem as well.
    BaBE: Enhancing Fairness via Estimation of Latent Explaining Variables. (arXiv:2307.02891v1 [cs.LG])
    We consider the problem of unfair discrimination between two groups and propose a pre-processing method to achieve fairness. Corrective methods like statistical parity usually lead to bad accuracy and do not really achieve fairness in situations where there is a correlation between the sensitive attribute S and the legitimate attribute E (explanatory variable) that should determine the decision. To overcome these drawbacks, other notions of fairness have been proposed, in particular, conditional statistical parity and equal opportunity. However, E is often not directly observable in the data, i.e., it is a latent variable. We may observe some other variable Z representing E, but the problem is that Z may also be affected by S, hence Z itself can be biased. To deal with this problem, we propose BaBE (Bayesian Bias Elimination), an approach based on a combination of Bayes inference and the Expectation-Maximization method, to estimate the most likely value of E for a given Z for each group. The decision can then be based directly on the estimated E. We show, by experiments on synthetic and real data sets, that our approach provides a good level of fairness as well as high accuracy.
    Trainable Weight Averaging: A General Approach for Subspace Training. (arXiv:2205.13104v2 [cs.LG] UPDATED)
    Training deep neural networks (DNNs) in low-dimensional subspaces is a promising direction for achieving efficient training and better generalization performance. Previous works extract the subspaces by using random projection or performing dimensionality reduction method on the training trajectory, but these methods can be inefficient or unstable in terms of dimensionality and numerical operations. In this paper, we connect subspace training to weight averaging and propose Trainable Weight Averaging (TWA), a general approach for subspace training that generalizes the previous efforts. TWA is efficient in terms of dimensionality and also easy to use, making it a promising new method for subspace training. We further design an efficient scheme for subspace training to cope with large-scale problems, which allows parallel training across multiple nodes and evenly distributing the memory and computation burden to each node. We apply TWA to efficient neural network training and improving fine-tuning performance tasks to demonstrate the great efficiency and effectiveness of our approach. We conduct extensive experiments that cover various benchmark computer vision and neural language processing tasks with various architectures. The code of implementation is available at https://github.com/nblt/TWA.
    Dynamic Observation Policies in Observation Cost-Sensitive Reinforcement Learning. (arXiv:2307.02620v1 [cs.LG])
    Reinforcement learning (RL) has been shown to learn sophisticated control policies for complex tasks including games, robotics, heating and cooling systems and text generation. The action-perception cycle in RL, however, generally assumes that a measurement of the state of the environment is available at each time step without a cost. In applications such as deep-sea and planetary robot exploration, materials design and medicine, however, there can be a high cost associated with measuring, or even approximating, the state of the environment. In this paper, we survey the recently growing literature that adopts the perspective that an RL agent might not need, or even want, a costly measurement at each time step. Within this context, we propose the Deep Dynamic Multi-Step Observationless Agent (DMSOA), contrast it with the literature and empirically evaluate it on OpenAI gym and Atari Pong environments. Our results, show that DMSOA learns a better policy with fewer decision steps and measurements than the considered alternative from the literature.
    T-MARS: Improving Visual Representations by Circumventing Text Feature Learning. (arXiv:2307.03132v1 [cs.CV])
    Large web-sourced multimodal datasets have powered a slew of new methods for learning general-purpose visual representations, advancing the state of the art in computer vision and revolutionizing zero- and few-shot recognition. One crucial decision facing practitioners is how, if at all, to curate these ever-larger datasets. For example, the creators of the LAION-5B dataset chose to retain only image-caption pairs whose CLIP similarity score exceeded a designated threshold. In this paper, we propose a new state-of-the-art data filtering approach motivated by our observation that nearly 40% of LAION's images contain text that overlaps significantly with the caption. Intuitively, such data could be wasteful as it incentivizes models to perform optical character recognition rather than learning visual features. However, naively removing all such data could also be wasteful, as it throws away images that contain visual features (in addition to overlapping text). Our simple and scalable approach, T-MARS (Text Masking and Re-Scoring), filters out only those pairs where the text dominates the remaining visual features -- by first masking out the text and then filtering out those with a low CLIP similarity score of the masked image. Experimentally, T-MARS outperforms the top-ranked method on the "medium scale" of DataComp (a data filtering benchmark) by a margin of 6.5% on ImageNet and 4.7% on VTAB. Additionally, our systematic evaluation on various data pool sizes from 2M to 64M shows that the accuracy gains enjoyed by T-MARS linearly increase as data and compute are scaled exponentially. Code is available at https://github.com/locuslab/T-MARS.
    Approximating the Shapley Value without Marginal Contributions. (arXiv:2302.00736v2 [cs.LG] UPDATED)
    The Shapley value is arguably the most popular approach for assigning a meaningful contribution value to players in a cooperative game, which has recently been used intensively in explainable artificial intelligence. The meaningfulness is due to axiomatic properties that only the Shapley value satisfies, which, however, comes at the expense of an exact computation growing exponentially with the number of agents. Accordingly, a number of works are devoted to the efficient approximation of the Shapley values, most of them revolve around the notion of an agent's marginal contribution. In this paper, we propose with SVARM and Stratified SVARM two parameter-free and domain-independent approximation algorithms based on a representation of the Shapley value detached from the notion of marginal contributions. We prove unmatched theoretical guarantees regarding their approximation quality and provide empirical results including synthetic games as well as common explainability use cases comparing ourselves with state-of-the-art methods.
    Learning to reconstruct the bubble distribution with conductivity maps using Invertible Neural Networks and Error Diffusion. (arXiv:2307.02496v1 [eess.IV])
    Electrolysis is crucial for eco-friendly hydrogen production, but gas bubbles generated during the process hinder reactions, reduce cell efficiency, and increase energy consumption. Additionally, these gas bubbles cause changes in the conductivity inside the cell, resulting in corresponding variations in the induced magnetic field around the cell. Therefore, measuring these gas bubble-induced magnetic field fluctuations using external magnetic sensors and solving the inverse problem of Biot-Savart Law allows for estimating the conductivity in the cell and, thus, bubble size and location. However, determining high-resolution conductivity maps from only a few induced magnetic field measurements is an ill-posed inverse problem. To overcome this, we exploit Invertible Neural Networks (INNs) to reconstruct the conductivity field. Our qualitative results and quantitative evaluation using random error diffusion show that INN achieves far superior performance compared to Tikhonov regularization.
    Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations. (arXiv:2206.04779v3 [cs.LG] UPDATED)
    Offline reinforcement learning has shown great promise in leveraging large pre-collected datasets for policy learning, allowing agents to forgo often-expensive online data collection. However, offline reinforcement learning from visual observations with continuous action spaces remains under-explored, with a limited understanding of the key challenges in this complex domain. In this paper, we establish simple baselines for continuous control in the visual domain and introduce a suite of benchmarking tasks for offline reinforcement learning from visual observations designed to better represent the data distributions present in real-world offline RL problems and guided by a set of desiderata for offline RL from visual observations, including robustness to visual distractions and visually identifiable changes in dynamics. Using this suite of benchmarking tasks, we show that simple modifications to two popular vision-based online reinforcement learning algorithms, DreamerV2 and DrQ-v2, suffice to outperform existing offline RL methods and establish competitive baselines for continuous control in the visual domain. We rigorously evaluate these algorithms and perform an empirical evaluation of the differences between state-of-the-art model-based and model-free offline RL methods for continuous control from visual observations. All code and data used in this evaluation are open-sourced to facilitate progress in this domain.
    Hybrid Ground-State Quantum Algorithms based on Neural Schr\"odinger Forging. (arXiv:2307.02633v1 [quant-ph])
    Entanglement forging based variational algorithms leverage the bi-partition of quantum systems for addressing ground state problems. The primary limitation of these approaches lies in the exponential summation required over the numerous potential basis states, or bitstrings, when performing the Schmidt decomposition of the whole system. To overcome this challenge, we propose a new method for entanglement forging employing generative neural networks to identify the most pertinent bitstrings, eliminating the need for the exponential sum. Through empirical demonstrations on systems of increasing complexity, we show that the proposed algorithm achieves comparable or superior performance compared to the existing standard implementation of entanglement forging. Moreover, by controlling the amount of required resources, this scheme can be applied to larger, as well as non permutation invariant systems, where the latter constraint is associated with the Heisenberg forging procedure. We substantiate our findings through numerical simulations conducted on spins models exhibiting one-dimensional ring, two-dimensional triangular lattice topologies, and nuclear shell model configurations.
    Scaling In-Context Demonstrations with Structured Attention. (arXiv:2307.02690v1 [cs.CL])
    The recent surge of large language models (LLMs) highlights their ability to perform in-context learning, i.e., "learning" to perform a task from a few demonstrations in the context without any parameter updates. However, their capabilities of in-context learning are limited by the model architecture: 1) the use of demonstrations is constrained by a maximum sentence length due to positional embeddings; 2) the quadratic complexity of attention hinders users from using more demonstrations efficiently; 3) LLMs are shown to be sensitive to the order of the demonstrations. In this work, we tackle these challenges by proposing a better architectural design for in-context learning. We propose SAICL (Structured Attention for In-Context Learning), which replaces the full-attention by a structured attention mechanism designed for in-context learning, and removes unnecessary dependencies between individual demonstrations, while making the model invariant to the permutation of demonstrations. We evaluate SAICL in a meta-training framework and show that SAICL achieves comparable or better performance than full attention while obtaining up to 3.4x inference speed-up. SAICL also consistently outperforms a strong Fusion-in-Decoder (FiD) baseline which processes each demonstration independently. Finally, thanks to its linear nature, we demonstrate that SAICL can easily scale to hundreds of demonstrations with continuous performance gains with scaling.
    A Detailed Study of Interpretability of Deep Neural Network based Top Taggers. (arXiv:2210.04371v4 [hep-ex] UPDATED)
    Recent developments in the methods of explainable AI (XAI) allow researchers to explore the inner workings of deep neural networks (DNNs), revealing crucial information about input-output relationships and realizing how data connects with machine learning models. In this paper we explore interpretability of DNN models designed to identify jets coming from top quark decay in high energy proton-proton collisions at the Large Hadron Collider (LHC). We review a subset of existing top tagger models and explore different quantitative methods to identify which features play the most important roles in identifying the top jets. We also investigate how and why feature importance varies across different XAI metrics, how correlations among features impact their explainability, and how latent space representations encode information as well as correlate with physically meaningful quantities. Our studies uncover some major pitfalls of existing XAI methods and illustrate how they can be overcome to obtain consistent and meaningful interpretation of these models. We additionally illustrate the activity of hidden layers as Neural Activation Pattern (NAP) diagrams and demonstrate how they can be used to understand how DNNs relay information across the layers and how this understanding can help to make such models significantly simpler by allowing effective model reoptimization and hyperparameter tuning. These studies not only facilitate a methodological approach to interpreting models but also unveil new insights about what these models learn. Incorporating these observations into augmented model design, we propose the Particle Flow Interaction Network (PFIN) model and demonstrate how interpretability-inspired model augmentation can improve top tagging performance.
    FedVS: Straggler-Resilient and Privacy-Preserving Vertical Federated Learning for Split Models. (arXiv:2304.13407v3 [cs.LG] UPDATED)
    In a vertical federated learning (VFL) system consisting of a central server and many distributed clients, the training data are vertically partitioned such that different features are privately stored on different clients. The problem of split VFL is to train a model split between the server and the clients. This paper aims to address two major challenges in split VFL: 1) performance degradation due to straggling clients during training; and 2) data and model privacy leakage from clients' uploaded data embeddings. We propose FedVS to simultaneously address these two challenges. The key idea of FedVS is to design secret sharing schemes for the local data and models, such that information-theoretical privacy against colluding clients and curious server is guaranteed, and the aggregation of all clients' embeddings is reconstructed losslessly, via decrypting computation shares from the non-straggling clients. Extensive experiments on various types of VFL datasets (including tabular, CV, and multi-view) demonstrate the universal advantages of FedVS in straggler mitigation and privacy protection over baseline protocols.
    An explainable model to support the decision about the therapy protocol for AML. (arXiv:2307.02631v1 [cs.LG])
    Acute Myeloid Leukemia (AML) is one of the most aggressive types of hematological neoplasm. To support the specialists' decision about the appropriate therapy, patients with AML receive a prognostic of outcomes according to their cytogenetic and molecular characteristics, often divided into three risk categories: favorable, intermediate, and adverse. However, the current risk classification has known problems, such as the heterogeneity between patients of the same risk group and no clear definition of the intermediate risk category. Moreover, as most patients with AML receive an intermediate-risk classification, specialists often demand other tests and analyses, leading to delayed treatment and worsening of the patient's clinical condition. This paper presents the data analysis and an explainable machine-learning model to support the decision about the most appropriate therapy protocol according to the patient's survival prediction. In addition to the prediction model being explainable, the results obtained are promising and indicate that it is possible to use it to support the specialists' decisions safely. Most importantly, the findings offered in this study have the potential to open new avenues of research toward better treatments and prognostic markers.
    Can Domain Adaptation Improve Accuracy and Fairness of Skin Lesion Classification?. (arXiv:2307.03157v1 [cs.CV])
    Deep learning-based diagnostic system has demonstrated potential in classifying skin cancer conditions when labeled training example are abundant. However, skin lesion analysis often suffers from a scarcity of labeled data, hindering the development of an accurate and reliable diagnostic system. In this work, we leverage multiple skin lesion datasets and investigate the feasibility of various unsupervised domain adaptation (UDA) methods in binary and multi-class skin lesion classification. In particular, we assess three UDA training schemes: single-, combined-, and multi-source. Our experiment results show that UDA is effective in binary classification, with further improvement being observed when imbalance is mitigated. In multi-class task, its performance is less prominent, and imbalance problem again needs to be addressed to achieve above-baseline accuracy. Through our quantitative analysis, we find that the test error of multi-class tasks is strongly correlated with label shift, and feature-level UDA methods have limitations when handling imbalanced datasets. Finally, our study reveals that UDA can effectively reduce bias against minority groups and promote fairness, even without the explicit use of fairness-focused techniques.
    An AI tool for automated analysis of large-scale unstructured clinical cine CMR databases. (arXiv:2206.08137v2 [eess.IV] UPDATED)
    Artificial intelligence (AI) techniques have been proposed for automating analysis of short axis (SAX) cine cardiac magnetic resonance (CMR), but no CMR analysis tool exists to automatically analyse large (unstructured) clinical CMR datasets. We develop and validate a robust AI tool for start-to-end automatic quantification of cardiac function from SAX cine CMR in large clinical databases. Our pipeline for processing and analysing CMR databases includes automated steps to identify the correct data, robust image pre-processing, an AI algorithm for biventricular segmentation of SAX CMR and estimation of functional biomarkers, and automated post-analysis quality control to detect and correct errors. The segmentation algorithm was trained on 2793 CMR scans from two NHS hospitals and validated on additional cases from this dataset (n=414) and five external datasets (n=6888), including scans of patients with a range of diseases acquired at 12 different centres using CMR scanners from all major vendors. Median absolute errors in cardiac biomarkers were within the range of inter-observer variability: <8.4mL (left ventricle volume), <9.2mL (right ventricle volume), <13.3g (left ventricular mass), and <5.9% (ejection fraction) across all datasets. Stratification of cases according to phenotypes of cardiac disease and scanner vendors showed good performance across all groups. We show that our proposed tool, which combines image pre-processing steps, a domain-generalisable AI algorithm trained on a large-scale multi-domain CMR dataset and quality control steps, allows robust analysis of (clinical or research) databases from multiple centres, vendors, and cardiac diseases. This enables translation of our tool for use in fully-automated processing of large multi-centre databases.
    Focused Transformer: Contrastive Training for Context Scaling. (arXiv:2307.03170v1 [cs.CL])
    Large language models have an exceptional capability to incorporate new information in a contextual manner. However, the full potential of such an approach is often restrained due to a limitation in the effective context length. One solution to this issue is to endow an attention layer with access to an external memory, which comprises of (key, value) pairs. Yet, as the number of documents increases, the proportion of relevant keys to irrelevant ones decreases, leading the model to focus more on the irrelevant keys. We identify a significant challenge, dubbed the distraction issue, where keys linked to different semantic values might overlap, making them hard to distinguish. To tackle this problem, we introduce the Focused Transformer (FoT), a technique that employs a training process inspired by contrastive learning. This novel approach enhances the structure of the (key, value) space, enabling an extension of the context length. Our method allows for fine-tuning pre-existing, large-scale models to lengthen their effective context. This is demonstrated by our fine-tuning of $3B$ and $7B$ OpenLLaMA checkpoints. The resulting models, which we name LongLLaMA, exhibit advancements in tasks requiring a long context. We further illustrate that our LongLLaMA models adeptly manage a $256 k$ context length for passkey retrieval.
    Kernels, Data & Physics. (arXiv:2307.02693v1 [cs.LG])
    Lecture notes from the course given by Professor Julia Kempe at the summer school "Statistical physics of Machine Learning" in Les Houches. The notes discuss the so-called NTK approach to problems in machine learning, which consists of gaining an understanding of generally unsolvable problems by finding a tractable kernel formulation. The notes are mainly focused on practical applications such as data distillation and adversarial robustness, examples of inductive bias are also discussed.
    Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching. (arXiv:2205.03447v7 [cs.AI] UPDATED)
    Ontology Matching (OM) plays an important role in many domains such as bioinformatics and the Semantic Web, and its research is becoming increasingly popular, especially with the application of machine learning (ML) techniques. Although the Ontology Alignment Evaluation Initiative (OAEI) represents an impressive effort for the systematic evaluation of OM systems, it still suffers from several limitations including limited evaluation of subsumption mappings, suboptimal reference mappings, and limited support for the evaluation of ML-based systems. To tackle these limitations, we introduce five new biomedical OM tasks involving ontologies extracted from Mondo and UMLS. Each task includes both equivalence and subsumption matching; the quality of reference mappings is ensured by human curation, ontology pruning, etc.; and a comprehensive evaluation framework is proposed to measure OM performance from various perspectives for both ML-based and non-ML-based OM systems. We report evaluation results for OM systems of different types to demonstrate the usage of these resources, all of which are publicly available as part of the new BioML track at OAEI 2022.
    The Role of Subgroup Separability in Group-Fair Medical Image Classification. (arXiv:2307.02791v1 [cs.CV])
    We investigate performance disparities in deep classifiers. We find that the ability of classifiers to separate individuals into subgroups varies substantially across medical imaging modalities and protected characteristics; crucially, we show that this property is predictive of algorithmic bias. Through theoretical analysis and extensive empirical evaluation, we find a relationship between subgroup separability, subgroup disparities, and performance degradation when models are trained on data with systematic bias such as underdiagnosis. Our findings shed new light on the question of how models become biased, providing important insights for the development of fair medical imaging AI.
    Learning to Predict Navigational Patterns from Partial Observations. (arXiv:2304.13242v2 [cs.CV] UPDATED)
    Human beings cooperatively navigate rule-constrained environments by adhering to mutually known navigational patterns, which may be represented as directional pathways or road lanes. Inferring these navigational patterns from incompletely observed environments is required for intelligent mobile robots operating in unmapped locations. However, algorithmically defining these navigational patterns is nontrivial. This paper presents the first self-supervised learning (SSL) method for learning to infer navigational patterns in real-world environments from partial observations only. We explain how geometric data augmentation, predictive world modeling, and an information-theoretic regularizer enables our model to predict an unbiased local directional soft lane probability (DSLP) field in the limit of infinite data. We demonstrate how to infer global navigational patterns by fitting a maximum likelihood graph to the DSLP field. Experiments show that our SSL model outperforms two SOTA supervised lane graph prediction models on the nuScenes dataset. We propose our SSL method as a scalable and interpretable continual learning paradigm for navigation by perception. Code is available at https://github.com/robin-karlsson0/dslp.
    Learning Multi-Agent Intention-Aware Communication for Optimal Multi-Order Execution in Finance. (arXiv:2307.03119v1 [cs.AI])
    Order execution is a fundamental task in quantitative finance, aiming at finishing acquisition or liquidation for a number of trading orders of the specific assets. Recent advance in model-free reinforcement learning (RL) provides a data-driven solution to the order execution problem. However, the existing works always optimize execution for an individual order, overlooking the practice that multiple orders are specified to execute simultaneously, resulting in suboptimality and bias. In this paper, we first present a multi-agent RL (MARL) method for multi-order execution considering practical constraints. Specifically, we treat every agent as an individual operator to trade one specific order, while keeping communicating with each other and collaborating for maximizing the overall profits. Nevertheless, the existing MARL algorithms often incorporate communication among agents by exchanging only the information of their partial observations, which is inefficient in complicated financial market. To improve collaboration, we then propose a learnable multi-round communication protocol, for the agents communicating the intended actions with each other and refining accordingly. It is optimized through a novel action value attribution method which is provably consistent with the original learning objective yet more efficient. The experiments on the data from two real-world markets have illustrated superior performance with significantly better collaboration effectiveness achieved by our method.
    An Automatic Guidance and Quality Assessment System for Doppler Imaging of Umbilical Artery. (arXiv:2304.05463v2 [eess.IV] UPDATED)
    Examination of the umbilical artery with Doppler ultrasonography is performed to investigate blood supply to the fetus through the umbilical cord, which is vital for the monitoring of fetal health. Such examination involves several steps that must be performed correctly: identifying suitable sites on the umbilical artery for the measurement, acquiring the blood flow curve in the form of a Doppler spectrum, and ensuring compliance to a set of quality standards. These steps rely heavily on the operator's skill, and the shortage of experienced sonographers has thus created a demand for machine assistance. In this work, we propose an automatic system to fill the gap. By using a modified Faster R-CNN network, we obtain an algorithm that can suggest locations suitable for Doppler measurement. Meanwhile, we have also developed a method for assessment of the Doppler spectrum's quality. The proposed system is validated on 657 images from a national ultrasound screening database, with results demonstrating its potential as a guidance system.
    ZeroFlow: Fast Zero Label Scene Flow via Distillation. (arXiv:2305.10424v4 [cs.CV] UPDATED)
    Scene flow estimation is the task of describing the 3D motion field between temporally successive point clouds. State-of-the-art methods use strong priors and test-time optimization techniques, but require on the order of tens of seconds for large-scale point clouds, making them unusable as computer vision primitives for real-time applications such as open world object detection. Feed forward methods are considerably faster, running on the order of tens to hundreds of milliseconds for large-scale point clouds, but require expensive human supervision. To address both limitations, we propose Scene Flow via Distillation, a simple distillation framework that uses a label-free optimization method to produce pseudo-labels to supervise a feed forward model. Our instantiation of this framework, ZeroFlow, produces scene flow estimates in real-time on large-scale point clouds at quality competitive with state-of-the-art methods while using zero human labels. Notably, at test-time ZeroFlow is over 1000$\times$ faster than label-free state-of-the-art optimization-based methods on large-scale point clouds and over 1000$\times$ cheaper to train on unlabeled data compared to the cost of human annotation of that data. To facilitate research reuse, we release our code, trained model weights, and high quality pseudo-labels for the Argoverse 2 and Waymo Open datasets.
    PathProx: A Proximal Gradient Algorithm for Weight Decay Regularized Deep Neural Networks. (arXiv:2210.03069v4 [cs.LG] UPDATED)
    Weight decay is one of the most widely used forms of regularization in deep learning, and has been shown to improve generalization and robustness. The optimization objective driving weight decay is a sum of losses plus a term proportional to the sum of squared weights. This paper argues that stochastic gradient descent (SGD) may be an inefficient algorithm for this objective. For neural networks with ReLU activations, solutions to the weight decay objective are equivalent to those of a different objective in which the regularization term is instead a sum of products of $\ell_2$ (not squared) norms of the input and output weights associated with each ReLU neuron. This alternative (and effectively equivalent) regularization suggests a novel proximal gradient algorithm for network training. Theory and experiments support the new training approach, showing that it can converge much faster to the sparse solutions it shares with standard weight decay training.
    Through the Fairness Lens: Experimental Analysis and Evaluation of Entity Matching. (arXiv:2307.02726v1 [cs.DB])
    Entity matching (EM) is a challenging problem studied by different communities for over half a century. Algorithmic fairness has also become a timely topic to address machine bias and its societal impacts. Despite extensive research on these two topics, little attention has been paid to the fairness of entity matching. Towards addressing this gap, we perform an extensive experimental evaluation of a variety of EM techniques in this paper. We generated two social datasets from publicly available datasets for the purpose of auditing EM through the lens of fairness. Our findings underscore potential unfairness under two common conditions in real-world societies: (i) when some demographic groups are overrepresented, and (ii) when names are more similar in some groups compared to others. Among our many findings, it is noteworthy to mention that while various fairness definitions are valuable for different settings, due to EM's class imbalance nature, measures such as positive predictive value parity and true positive rate parity are, in general, more capable of revealing EM unfairness.
    DiffusionDB: A Large-scale Prompt Gallery Dataset for Text-to-Image Generative Models. (arXiv:2210.14896v4 [cs.CV] UPDATED)
    With recent advancements in diffusion models, users can generate high-quality images by writing text prompts in natural language. However, generating images with desired details requires proper prompts, and it is often unclear how a model reacts to different prompts or what the best prompts are. To help researchers tackle these critical challenges, we introduce DiffusionDB, the first large-scale text-to-image prompt dataset totaling 6.5TB, containing 14 million images generated by Stable Diffusion, 1.8 million unique prompts, and hyperparameters specified by real users. We analyze the syntactic and semantic characteristics of prompts. We pinpoint specific hyperparameter values and prompt styles that can lead to model errors and present evidence of potentially harmful model usage, such as the generation of misinformation. The unprecedented scale and diversity of this human-actuated dataset provide exciting research opportunities in understanding the interplay between prompts and generative models, detecting deepfakes, and designing human-AI interaction tools to help users more easily use these models. DiffusionDB is publicly available at: https://poloclub.github.io/diffusiondb.
    Loss Functions and Metrics in Deep Learning. A Review. (arXiv:2307.02694v1 [cs.LG])
    One of the essential components of deep learning is the choice of the loss function and performance metrics used to train and evaluate models. This paper reviews the most prevalent loss functions and performance measurements in deep learning. We examine the benefits and limits of each technique and illustrate their application to various deep-learning problems. Our review aims to give a comprehensive picture of the different loss functions and performance indicators used in the most common deep learning tasks and help practitioners choose the best method for their specific task.  ( 2 min )
    Human Inspired Progressive Alignment and Comparative Learning for Grounded Word Acquisition. (arXiv:2307.02615v1 [cs.CL])
    Human language acquisition is an efficient, supervised, and continual process. In this work, we took inspiration from how human babies acquire their first language, and developed a computational process for word acquisition through comparative learning. Motivated by cognitive findings, we generated a small dataset that enables the computation models to compare the similarities and differences of various attributes, learn to filter out and extract the common information for each shared linguistic label. We frame the acquisition of words as not only the information filtration process, but also as representation-symbol mapping. This procedure does not involve a fixed vocabulary size, nor a discriminative objective, and allows the models to continually learn more concepts efficiently. Our results in controlled experiments have shown the potential of this approach for efficient continual learning of grounded words.  ( 2 min )
    Spotting Virus from Satellites: Modeling the Circulation of West Nile Virus Through Graph Neural Networks. (arXiv:2209.05251v2 [cs.CV] UPDATED)
    The occurrence of West Nile Virus (WNV) represents one of the most common mosquito-borne zoonosis viral infections. Its circulation is usually associated with climatic and environmental conditions suitable for vector proliferation and virus replication. On top of that, several statistical models have been developed to shape and forecast WNV circulation: in particular, the recent massive availability of Earth Observation (EO) data, coupled with the continuous advances in the field of Artificial Intelligence, offer valuable opportunities. In this paper, we seek to predict WNV circulation by feeding Deep Neural Networks (DNNs) with satellite images, which have been extensively shown to hold environmental and climatic features. Notably, while previous approaches analyze each geographical site independently, we propose a spatial-aware approach that considers also the characteristics of close sites. Specifically, we build upon Graph Neural Networks (GNN) to aggregate features from neighbouring places, and further extend these modules to consider multiple relations, such as the difference in temperature and soil moisture between two sites, as well as the geographical distance. Moreover, we inject time-related information directly into the model to take into account the seasonality of virus spread. We design an experimental setting that combines satellite images - from Landsat and Sentinel missions - with ground truth observations of WNV circulation in Italy. We show that our proposed Multi-Adjacency Graph Attention Network (MAGAT) consistently leads to higher performance when paired with an appropriate pre-training stage. Finally, we assess the importance of each component of MAGAT in our ablation studies.
    DeepOnto: A Python Package for Ontology Engineering with Deep Learning. (arXiv:2307.03067v1 [cs.AI])
    Applying deep learning techniques, particularly language models (LMs), in ontology engineering has raised widespread attention. However, deep learning frameworks like PyTorch and Tensorflow are predominantly developed for Python programming, while widely-used ontology APIs, such as the OWL API and Jena, are primarily Java-based. To facilitate seamless integration of these frameworks and APIs, we present Deeponto, a Python package designed for ontology engineering. The package encompasses a core ontology processing module founded on the widely-recognised and reliable OWL API, encapsulating its fundamental features in a more "Pythonic" manner and extending its capabilities to include other essential components including reasoning, verbalisation, normalisation, projection, and more. Building on this module, Deeponto offers a suite of tools, resources, and algorithms that support various ontology engineering tasks, such as ontology alignment and completion, by harnessing deep learning methodologies, primarily pre-trained LMs. In this paper, we also demonstrate the practical utility of Deeponto through two use-cases: the Digital Health Coaching in Samsung Research UK and the Bio-ML track of the Ontology Alignment Evaluation Initiative (OAEI).  ( 2 min )
    Denoise Pretraining on Nonequilibrium Molecules for Accurate and Transferable Neural Potentials. (arXiv:2303.02216v2 [cs.LG] UPDATED)
    Recent advances in equivariant graph neural networks (GNNs) have made deep learning amenable to developing fast surrogate models to expensive ab initio quantum mechanics (QM) approaches for molecular potential predictions. However, building accurate and transferable potential models using GNNs remains challenging, as the data is greatly limited by the expensive computational costs and level of theory of QM methods, especially for large and complex molecular systems. In this work, we propose denoise pretraining on nonequilibrium molecular conformations to achieve more accurate and transferable GNN potential predictions. Specifically, atomic coordinates of sampled nonequilibrium conformations are perturbed by random noises and GNNs are pretrained to denoise the perturbed molecular conformations which recovers the original coordinates. Rigorous experiments on multiple benchmarks reveal that pretraining significantly improves the accuracy of neural potentials. Furthermore, we show that the proposed pretraining approach is model-agnostic, as it improves the performance of different invariant and equivariant GNNs. Notably, our models pretrained on small molecules demonstrate remarkable transferability, improving performance when fine-tuned on diverse molecular systems, including different elements, charged molecules, biomolecules, and larger systems. These results highlight the potential for leveraging denoise pretraining approaches to build more generalizable neural potentials for complex molecular systems.  ( 3 min )
    Principal subbundles for dimension reduction. (arXiv:2307.03128v1 [stat.ME])
    In this paper we demonstrate how sub-Riemannian geometry can be used for manifold learning and surface reconstruction by combining local linear approximations of a point cloud to obtain lower dimensional bundles. Local approximations obtained by local PCAs are collected into a rank $k$ tangent subbundle on $\mathbb{R}^d$, $k<d$, which we call a principal subbundle. This determines a sub-Riemannian metric on $\mathbb{R}^d$. We show that sub-Riemannian geodesics with respect to this metric can successfully be applied to a number of important problems, such as: explicit construction of an approximating submanifold $M$, construction of a representation of the point-cloud in $\mathbb{R}^k$, and computation of distances between observations, taking the learned geometry into account. The reconstruction is guaranteed to equal the true submanifold in the limit case where tangent spaces are estimated exactly. Via simulations, we show that the framework is robust when applied to noisy data. Furthermore, the framework generalizes to observations on an a priori known Riemannian manifold.  ( 2 min )
    Generalization Guarantees via Algorithm-dependent Rademacher Complexity. (arXiv:2307.02501v1 [stat.ML])
    Algorithm- and data-dependent generalization bounds are required to explain the generalization behavior of modern machine learning algorithms. In this context, there exists information theoretic generalization bounds that involve (various forms of) mutual information, as well as bounds based on hypothesis set stability. We propose a conceptually related, but technically distinct complexity measure to control generalization error, which is the empirical Rademacher complexity of an algorithm- and data-dependent hypothesis class. Combining standard properties of Rademacher complexity with the convenient structure of this class, we are able to (i) obtain novel bounds based on the finite fractal dimension, which (a) extend previous fractal dimension-type bounds from continuous to finite hypothesis classes, and (b) avoid a mutual information term that was required in prior work; (ii) we greatly simplify the proof of a recent dimension-independent generalization bound for stochastic gradient descent; and (iii) we easily recover results for VC classes and compression schemes, similar to approaches based on conditional mutual information.  ( 2 min )
    A Near-Linear Time Algorithm for the Chamfer Distance. (arXiv:2307.03043v1 [cs.DS])
    For any two point sets $A,B \subset \mathbb{R}^d$ of size up to $n$, the Chamfer distance from $A$ to $B$ is defined as $\text{CH}(A,B)=\sum_{a \in A} \min_{b \in B} d_X(a,b)$, where $d_X$ is the underlying distance measure (e.g., the Euclidean or Manhattan distance). The Chamfer distance is a popular measure of dissimilarity between point clouds, used in many machine learning, computer vision, and graphics applications, and admits a straightforward $O(d n^2)$-time brute force algorithm. Further, the Chamfer distance is often used as a proxy for the more computationally demanding Earth-Mover (Optimal Transport) Distance. However, the \emph{quadratic} dependence on $n$ in the running time makes the naive approach intractable for large datasets. We overcome this bottleneck and present the first $(1+\epsilon)$-approximate algorithm for estimating the Chamfer distance with a near-linear running time. Specifically, our algorithm runs in time $O(nd \log (n)/\varepsilon^2)$ and is implementable. Our experiments demonstrate that it is both accurate and fast on large high-dimensional datasets. We believe that our algorithm will open new avenues for analyzing large high-dimensional point clouds. We also give evidence that if the goal is to \emph{report} a $(1+\varepsilon)$-approximate mapping from $A$ to $B$ (as opposed to just its value), then any sub-quadratic time algorithm is unlikely to exist.  ( 2 min )
    Sampling-based Fast Gradient Rescaling Method for Highly Transferable Adversarial Attacks. (arXiv:2307.02828v1 [cs.CV])
    Deep neural networks are known to be vulnerable to adversarial examples crafted by adding human-imperceptible perturbations to the benign input. After achieving nearly 100% attack success rates in white-box setting, more focus is shifted to black-box attacks, of which the transferability of adversarial examples has gained significant attention. In either case, the common gradient-based methods generally use the sign function to generate perturbations on the gradient update, that offers a roughly correct direction and has gained great success. But little work pays attention to its possible limitation. In this work, we observe that the deviation between the original gradient and the generated noise may lead to inaccurate gradient update estimation and suboptimal solutions for adversarial transferability. To this end, we propose a Sampling-based Fast Gradient Rescaling Method (S-FGRM). Specifically, we use data rescaling to substitute the sign function without extra computational cost. We further propose a Depth First Sampling method to eliminate the fluctuation of rescaling and stabilize the gradient update. Our method could be used in any gradient-based attacks and is extensible to be integrated with various input transformation or ensemble methods to further improve the adversarial transferability. Extensive experiments on the standard ImageNet dataset show that our method could significantly boost the transferability of gradient-based attacks and outperform the state-of-the-art baselines.  ( 2 min )
    Evaluating the Planning and Operational Resilience of Electrical Distribution Systems with Distributed Energy Resources using Complex Network Theory. (arXiv:2208.11543v4 [eess.SY] UPDATED)
    Electrical Distribution Systems are extensively penetrated with Distributed Energy Resources (DERs) to cater the energy demands with the general perception that it enhances the system's resilience. However, integration of DERs may adversely affect the grid operation and affect the system resilience due to various factors like their intermittent availability, dynamics of weather conditions, non-linearity, complexity, number of malicious threats, and improved reliability requirements of consumers. This paper proposes a methodology to evaluate the planning and operational resilience of power distribution systems under extreme events and determines the withstand capability of the electrical network. The proposed framework is developed by effectively employing the complex network theory. Correlated networks for undesirable configurations are developed from the time series data of active power monitored at nodes of the electrical network. For these correlated networks, computed the network parameters such as clustering coefficient, assortative coefficient, average degree and power law exponent for the anticipation; and percolation threshold for the determination of the network withstand capability under extreme conditions. The proposed methodology is also suitable for identifying the hosting capacity of solar panels in the system while maintaining resilience under different unfavourable conditions and identifying the most critical nodes of the system that could drive the system into non-resilience. This framework is demonstrated on IEEE 123 node test feeder by generating active power time-series data for a variety of electrical conditions using simulation software, GridLAB-D. The percolation threshold resulted as an effective metric for the determination of the planning and operational resilience of the power distribution system.  ( 3 min )
    ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation. (arXiv:2301.13166v3 [cs.AI] UPDATED)
    The ability to accurately locate and navigate to a specific object is a crucial capability for embodied agents that operate in the real world and interact with objects to complete tasks. Such object navigation tasks usually require large-scale training in visual environments with labeled objects, which generalizes poorly to novel objects in unknown environments. In this work, we present a novel zero-shot object navigation method, Exploration with Soft Commonsense constraints (ESC), that transfers commonsense knowledge in pre-trained models to open-world object navigation without any navigation experience nor any other training on the visual environments. First, ESC leverages a pre-trained vision and language model for open-world prompt-based grounding and a pre-trained commonsense language model for room and object reasoning. Then ESC converts commonsense knowledge into navigation actions by modeling it as soft logic predicates for efficient exploration. Extensive experiments on MP3D, HM3D, and RoboTHOR benchmarks show that our ESC method improves significantly over baselines, and achieves new state-of-the-art results for zero-shot object navigation (e.g., 288% relative Success Rate improvement than CoW on MP3D).  ( 2 min )
    Improving Retrieval-Augmented Large Language Models via Data Importance Learning. (arXiv:2307.03027v1 [cs.LG])
    Retrieval augmentation enables large language models to take advantage of external knowledge, for example on tasks like question answering and data imputation. However, the performance of such retrieval-augmented models is limited by the data quality of their underlying retrieval corpus. In this paper, we propose an algorithm based on multilinear extension for evaluating the data importance of retrieved data points. There are exponentially many terms in the multilinear extension, and one key contribution of this paper is a polynomial time algorithm that computes exactly, given a retrieval-augmented model with an additive utility function and a validation set, the data importance of data points in the retrieval corpus using the multilinear extension of the model's utility function. We further proposed an even more efficient ({\epsilon}, {\delta})-approximation algorithm. Our experimental results illustrate that we can enhance the performance of large language models by only pruning or reweighting the retrieval corpus, without requiring further training. For some tasks, this even allows a small model (e.g., GPT-JT), augmented with a search engine API, to outperform GPT-3.5 (without retrieval augmentation). Moreover, we show that weights based on multilinear extension can be computed efficiently in practice (e.g., in less than ten minutes for a corpus with 100 million elements).  ( 2 min )
    Conditional Korhunen-Lo\'{e}ve regression model with Basis Adaptation for high-dimensional problems: uncertainty quantification and inverse modeling. (arXiv:2307.02572v1 [cs.LG])
    We propose a methodology for improving the accuracy of surrogate models of the observable response of physical systems as a function of the systems' spatially heterogeneous parameter fields with applications to uncertainty quantification and parameter estimation in high-dimensional problems. Practitioners often formulate finite-dimensional representations of spatially heterogeneous parameter fields using truncated unconditional Karhunen-Lo\'{e}ve expansions (KLEs) for a certain choice of unconditional covariance kernel and construct surrogate models of the observable response with respect to the random variables in the KLE. When direct measurements of the parameter fields are available, we propose improving the accuracy of these surrogate models by representing the parameter fields via conditional Karhunen-Lo\'{e}ve expansions (CKLEs). CKLEs are constructed by conditioning the covariance kernel of the unconditional expansion on the direct measurements via Gaussian process regression and then truncating the corresponding KLE. We apply the proposed methodology to constructing surrogate models via the Basis Adaptation (BA) method of the stationary hydraulic head response, measured at spatially discrete observation locations, of a groundwater flow model of the Hanford Site, as a function of the 1,000-dimensional representation of the model's log-transmissivity field. We find that BA surrogate models of the hydraulic head based on CKLEs are more accurate than BA surrogate models based on unconditional expansions for forward uncertainty quantification tasks. Furthermore, we find that inverse estimates of the hydraulic transmissivity field computed using CKLE-based BA surrogate models are more accurate than those computed using unconditional BA surrogate models.  ( 3 min )
    Towards a safe MLOps Process for the Continuous Development and Safety Assurance of ML-based Systems in the Railway Domain. (arXiv:2307.02867v1 [cs.SE])
    Traditional automation technologies alone are not sufficient to enable driverless operation of trains (called Grade of Automation (GoA) 4) on non-restricted infrastructure. The required perception tasks are nowadays realized using Machine Learning (ML) and thus need to be developed and deployed reliably and efficiently. One important aspect to achieve this is to use an MLOps process for tackling improved reproducibility, traceability, collaboration, and continuous adaptation of a driverless operation to changing conditions. MLOps mixes ML application development and operation (Ops) and enables high frequency software releases and continuous innovation based on the feedback from operations. In this paper, we outline a safe MLOps process for the continuous development and safety assurance of ML-based systems in the railway domain. It integrates system engineering, safety assurance, and the ML life-cycle in a comprehensive workflow. We present the individual stages of the process and their interactions. Moreover, we describe relevant challenges to automate the different stages of the safe MLOps process.
    OpenDelta: A Plug-and-play Library for Parameter-efficient Adaptation of Pre-trained Models. (arXiv:2307.03084v1 [cs.LG])
    The scale of large pre-trained models (PTMs) poses significant challenges in adapting to downstream tasks due to the high optimization overhead and storage costs associated with full-parameter fine-tuning. To address this, many studies explore parameter-efficient tuning methods, also framed as "delta tuning", which updates only a small subset of parameters, known as "delta modules", while keeping the backbone model's parameters fixed. However, the practicality and flexibility of delta tuning have been limited due to existing implementations that directly modify the code of the backbone PTMs and hard-code specific delta tuning methods for each PTM. In this paper, we present OpenDelta, an open-source library that overcomes these limitations by providing a plug-and-play implementation of various delta tuning methods. Our novel techniques eliminate the need to modify the backbone PTMs' code, making OpenDelta compatible with different, even novel PTMs. OpenDelta is designed to be simple, modular, and extensible, providing a comprehensive platform for researchers and practitioners to adapt large PTMs efficiently.
    Learning Low-Dimensional Nonlinear Structures from High-Dimensional Noisy Data: An Integral Operator Approach. (arXiv:2203.00126v2 [stat.ML] UPDATED)
    We propose a kernel-spectral embedding algorithm for learning low-dimensional nonlinear structures from high-dimensional and noisy observations, where the datasets are assumed to be sampled from an intrinsically low-dimensional manifold and corrupted by high-dimensional noise. The algorithm employs an adaptive bandwidth selection procedure which does not rely on prior knowledge of the underlying manifold. The obtained low-dimensional embeddings can be further utilized for downstream purposes such as data visualization, clustering and prediction. Our method is theoretically justified and practically interpretable. Specifically, we establish the convergence of the final embeddings to their noiseless counterparts when the dimension and size of the samples are comparably large, and characterize the effect of the signal-to-noise ratio on the rate of convergence and phase transition. We also prove convergence of the embeddings to the eigenfunctions of an integral operator defined by the kernel map of some reproducing kernel Hilbert space capturing the underlying nonlinear structures. Numerical simulations and analysis of three real datasets show the superior empirical performance of the proposed method, compared to many existing methods, on learning various manifolds in diverse applications.
    Track Mix Generation on Music Streaming Services using Transformers. (arXiv:2307.03045v1 [cs.IR])
    This paper introduces Track Mix, a personalized playlist generation system released in 2022 on the music streaming service Deezer. Track Mix automatically generates "mix" playlists inspired by initial music tracks, allowing users to discover music similar to their favorite content. To generate these mixes, we consider a Transformer model trained on millions of track sequences from user playlists. In light of the growing popularity of Transformers in recent years, we analyze the advantages, drawbacks, and technical challenges of using such a model for mix generation on the service, compared to a more traditional collaborative filtering approach. Since its release, Track Mix has been generating playlists for millions of users daily, enhancing their music discovery experience on Deezer.  ( 2 min )
    FREEDOM: Target Label & Source Data & Domain Information-Free Multi-Source Domain Adaptation for Unsupervised Personalization. (arXiv:2307.02493v1 [cs.LG])
    From a service perspective, Multi-Source Domain Adaptation (MSDA) is a promising scenario to adapt a deployed model to a client's dataset. It can provide adaptation without a target label and support the case where a source dataset is constructed from multiple domains. However, it is impractical, wherein its training heavily relies on prior domain information of the multi-source dataset -- how many domains exist and the domain label of each data sample. Moreover, MSDA requires both source and target datasets simultaneously (physically), causing storage limitations on the client device or data privacy issues by transferring client data to a server. For a more practical scenario of model adaptation from a service provider's point of view, we relax these constraints and present a novel problem scenario of Three-Free Domain Adaptation, namely TFDA, where 1) target labels, 2) source dataset, and mostly 3) source domain information (domain labels + the number of domains) are unavailable. Under the problem scenario, we propose a practical adaptation framework called FREEDOM. It leverages the power of the generative model, disentangling data into class and style aspects, where the style is defined as the class-independent information from the source data and designed with a nonparametric Bayesian approach. In the adaptation stage, FREEDOM aims to match the source class distribution with the target's under the philosophy that class distribution is consistent even if the style is different; after then, only part of the classification model is deployed as a personalized network. As a result, FREEDOM achieves state-of-the-art or comparable performance even without domain information, with reduced final model size on the target side, independent of the number of source domains.  ( 3 min )
    TransformerG2G: Adaptive time-stepping for learning temporal graph embeddings using transformers. (arXiv:2307.02588v1 [cs.LG])
    Dynamic graph embedding has emerged as a very effective technique for addressing diverse temporal graph analytic tasks (i.e., link prediction, node classification, recommender systems, anomaly detection, and graph generation) in various applications. Such temporal graphs exhibit heterogeneous transient dynamics, varying time intervals, and highly evolving node features throughout their evolution. Hence, incorporating long-range dependencies from the historical graph context plays a crucial role in accurately learning their temporal dynamics. In this paper, we develop a graph embedding model with uncertainty quantification, TransformerG2G, by exploiting the advanced transformer encoder to first learn intermediate node representations from its current state ($t$) and previous context (over timestamps [$t-1, t-l$], $l$ is the length of context). Moreover, we employ two projection layers to generate lower-dimensional multivariate Gaussian distributions as each node's latent embedding at timestamp $t$. We consider diverse benchmarks with varying levels of ``novelty" as measured by the TEA plots. Our experiments demonstrate that the proposed TransformerG2G model outperforms conventional multi-step methods and our prior work (DynG2G) in terms of both link prediction accuracy and computational efficiency, especially for high degree of novelty. Furthermore, the learned time-dependent attention weights across multiple graph snapshots reveal the development of an automatic adaptive time stepping enabled by the transformer. Importantly, by examining the attention weights, we can uncover temporal dependencies, identify influential elements, and gain insights into the complex interactions within the graph structure. For example, we identified a strong correlation between attention weights and node degree at the various stages of the graph topology evolution.  ( 3 min )
    Unleashing the Power of Electrocardiograms: A novel approach for Patient Identification in Healthcare Systems with ECG Signals. (arXiv:2302.06529v2 [cs.LG] UPDATED)
    Over the course of the past two decades, a substantial body of research has substantiated the viability of utilising cardiac signals as a biometric modality. This paper presents a novel approach for patient identification in healthcare systems using electrocardiogram signals. A convolutional neural network is used to classify users based on images extracted from ECG signals. The proposed identification system is evaluated in multiple databases, providing a comprehensive understanding of its potential in real-world scenarios. The impact of Cardiovascular Diseases on generic user identification has been largely overlooked in previous studies. The presented method takes into account the cardiovascular condition of the patients, ensuring that the results obtained are not biased or limited. Furthermore, the results obtained are consistent and reliable, with lower error rates and higher accuracy metrics, as demonstrated through extensive experimentation. All these features make the proposed method a valuable contribution to the field of patient identification in healthcare systems, and make it a strong contender for practical applications.  ( 2 min )
    Co-design Hardware and Algorithm for Vector Search. (arXiv:2306.11182v3 [cs.LG] UPDATED)
    Vector search has emerged as the foundation for large-scale information retrieval and machine learning systems, with search engines like Google and Bing processing tens of thousands of queries per second on petabyte-scale document datasets by evaluating vector similarities between encoded query texts and web documents. As performance demands for vector search systems surge, accelerated hardware offers a promising solution in the post-Moore's Law era. We introduce \textit{FANNS}, an end-to-end and scalable vector search framework on FPGAs. Given a user-provided recall requirement on a dataset and a hardware resource budget, \textit{FANNS} automatically co-designs hardware and algorithm, subsequently generating the corresponding accelerator. The framework also supports scale-out by incorporating a hardware TCP/IP stack in the accelerator. \textit{FANNS} attains up to 23.0$\times$ and 37.2$\times$ speedup compared to FPGA and CPU baselines, respectively, and demonstrates superior scalability to GPUs, achieving 5.5$\times$ and 7.6$\times$ speedup in median and 95\textsuperscript{th} percentile (P95) latency within an eight-accelerator configuration. The remarkable performance of \textit{FANNS} lays a robust groundwork for future FPGA integration in data centers and AI supercomputers.  ( 3 min )
    Conditional independence testing under model misspecification. (arXiv:2307.02520v1 [stat.ML])
    Conditional independence (CI) testing is fundamental and challenging in modern statistics and machine learning. Many modern methods for CI testing rely on powerful supervised learning methods to learn regression functions or Bayes predictors as an intermediate step. Although the methods are guaranteed to control Type-I error when the supervised learning methods accurately estimate the regression functions or Bayes predictors, their behavior is less understood when they fail due to model misspecification. In a broader sense, model misspecification can arise even when universal approximators (e.g., deep neural nets) are employed. Then, we study the performance of regression-based CI tests under model misspecification. Namely, we propose new approximations or upper bounds for the testing errors of three regression-based tests that depend on misspecification errors. Moreover, we introduce the Rao-Blackwellized Predictor Test (RBPT), a novel regression-based CI test robust against model misspecification. Finally, we conduct experiments with artificial and real data, showcasing the usefulness of our theory and methods.
    Stacking of Hyperparameter Tuned Models for Tagging Coding Problems. (arXiv:2306.10077v2 [cs.LG] UPDATED)
    Coding problems are problems that require a solution in the form of a computer program. Coding problems are popular among students and professionals as it enhances their skills and career opportunities. An AI system that would help those who practice coding problems would be highly useful and there is a huge potential for such a system. In this work, we propose a model which uses stacking of hyperparameter tuned boosting models to achieve impressive metric scores of 77.8% accuracy and 0.815 PR-AUC on the dataset that was scraped from Codeforces and Leetcode. We open source the dataset and the models developed for this work.  ( 2 min )
    A Real-time Human Pose Estimation Approach for Optimal Sensor Placement in Sensor-based Human Activity Recognition. (arXiv:2307.02906v1 [cs.LG])
    Sensor-based Human Activity Recognition facilitates unobtrusive monitoring of human movements. However, determining the most effective sensor placement for optimal classification performance remains challenging. This paper introduces a novel methodology to resolve this issue, using real-time 2D pose estimations derived from video recordings of target activities. The derived skeleton data provides a unique strategy for identifying the optimal sensor location. We validate our approach through a feasibility study, applying inertial sensors to monitor 13 different activities across ten subjects. Our findings indicate that the vision-based method for sensor placement offers comparable results to the conventional deep learning approach, demonstrating its efficacy. This research significantly advances the field of Human Activity Recognition by providing a lightweight, on-device solution for determining the optimal sensor placement, thereby enhancing data anonymization and supporting a multimodal classification approach.
    Sandwiched Video Compression: Efficiently Extending the Reach of Standard Codecs with Neural Wrappers. (arXiv:2303.11473v2 [eess.IV] UPDATED)
    We propose sandwiched video compression -- a video compression system that wraps neural networks around a standard video codec. The sandwich framework consists of a neural pre- and post-processor with a standard video codec between them. The networks are trained jointly to optimize a rate-distortion loss function with the goal of significantly improving over the standard codec in various compression scenarios. End-to-end training in this setting requires a differentiable proxy for the standard video codec, which incorporates temporal processing with motion compensation, inter/intra mode decisions, and in-loop filtering. We propose differentiable approximations to key video codec components and demonstrate that, in addition to providing meaningful compression improvements over the standard codec, the neural codes of the sandwich lead to significantly better rate-distortion performance in two important scenarios.When transporting high-resolution video via low-resolution HEVC, the sandwich system obtains 6.5 dB improvements over standard HEVC. More importantly, using the well-known perceptual similarity metric, LPIPS, we observe 30% improvements in rate at the same quality over HEVC. Last but not least, we show that pre- and post-processors formed by very modestly-parameterized, light-weight networks can closely approximate these results.
    ShadowNet: A Secure and Efficient On-device Model Inference System for Convolutional Neural Networks. (arXiv:2011.05905v4 [cs.CR] UPDATED)
    With the increased usage of AI accelerators on mobile and edge devices, on-device machine learning (ML) is gaining popularity. Thousands of proprietary ML models are being deployed today on billions of untrusted devices. This raises serious security concerns about model privacy. However, protecting model privacy without losing access to the untrusted AI accelerators is a challenging problem. In this paper, we present a novel on-device model inference system, ShadowNet. ShadowNet protects the model privacy with Trusted Execution Environment (TEE) while securely outsourcing the heavy linear layers of the model to the untrusted hardware accelerators. ShadowNet achieves this by transforming the weights of the linear layers before outsourcing them and restoring the results inside the TEE. The non-linear layers are also kept secure inside the TEE. ShadowNet's design ensures efficient transformation of the weights and the subsequent restoration of the results. We build a ShadowNet prototype based on TensorFlow Lite and evaluate it on five popular CNNs, namely, MobileNet, ResNet-44, MiniVGG, ResNet-404, and YOLOv4-tiny. Our evaluation shows that ShadowNet achieves strong security guarantees with reasonable performance, offering a practical solution for secure on-device model inference.
    Effects of data time lag in a decision-making system using machine learning for pork price prediction. (arXiv:2305.05677v2 [cs.LG] UPDATED)
    Spain is the third-largest producer of pork meat in the world, and many farms in several regions depend on the evolution of this market. However, the current pricing system is unfair, as some actors have better market information than others. In this context, historical pricing is an easy-to-find and affordable data source that can help all agents to be better informed. However, the time lag in data acquisition can affect their pricing decisions. In this paper, we study the effect that data acquisition delay has on a price prediction system using multiple prediction algorithms. We describe the integration of the best proposal into a decision support system prototype and test it in a real-case scenario. Specifically, we use public data from the most important regional pork meat markets in Spain published by the Ministry of Agriculture with a two-week delay and subscription-based data of the same markets obtained on the same day. The results show that the error difference between the best public and data subscription models is 0.6 Euro cents in favor of the data without delay. The market dimension makes these differences significant in the supply chain, giving pricing agents a better tool to negotiate market prices.
    ContainerGym: A Real-World Reinforcement Learning Benchmark for Resource Allocation. (arXiv:2307.02991v1 [cs.LG])
    We present ContainerGym, a benchmark for reinforcement learning inspired by a real-world industrial resource allocation task. The proposed benchmark encodes a range of challenges commonly encountered in real-world sequential decision making problems, such as uncertainty. It can be configured to instantiate problems of varying degrees of difficulty, e.g., in terms of variable dimensionality. Our benchmark differs from other reinforcement learning benchmarks, including the ones aiming to encode real-world difficulties, in that it is directly derived from a real-world industrial problem, which underwent minimal simplification and streamlining. It is sufficiently versatile to evaluate reinforcement learning algorithms on any real-world problem that fits our resource allocation framework. We provide results of standard baseline methods. Going beyond the usual training reward curves, our results and the statistical tools used to interpret them allow to highlight interesting limitations of well-known deep reinforcement learning algorithms, namely PPO, TRPO and DQN.
    Context-Aware Configuration and Management of WiFi Direct Groups for Real Opportunistic Networks. (arXiv:2307.03126v1 [cs.NI])
    Wi-Fi Direct is a promising technology for the support of device-to-device communications (D2D) on commercial mobile devices. However, the standard as-it-is is not sufficient to support the real deployment of networking solutions entirely based on D2D such as opportunistic networks. In fact, WiFi Direct presents some characteristics that could limit the autonomous creation of D2D connections among users' personal devices. Specifically, the standard explicitly requires the user's authorization to establish a connection between two or more devices, and it provides a limited support for inter-group communication. In some cases, this might lead to the creation of isolated groups of nodes which cannot communicate among each other. In this paper, we propose a novel middleware-layer protocol for the efficient configuration and management of WiFi Direct groups (WiFi Direct Group Manager, WFD-GM) to enable autonomous connections and inter-group communication. This enables opportunistic networks in real conditions (e.g., variable mobility and network size). WFD-GM defines a context function that takes into account heterogeneous parameters for the creation of the best group configuration in a specific time window, including an index of nodes' stability and power levels. We evaluate the protocol performances by simulating three reference scenarios including different mobility models, geographical areas and number of nodes. Simulations are also supported by experimental results related to the evaluation in a real testbed of the involved context parameters. We compare WFD-GM with the state-of-the-art solutions and we show that it performs significantly better than a Baseline approach in scenarios with medium/low mobility, and it is comparable with it in case of high mobility, without introducing additional overhead.  ( 3 min )
    UTRNet: High-Resolution Urdu Text Recognition In Printed Documents. (arXiv:2306.15782v2 [cs.CV] UPDATED)
    In this paper, we propose a novel approach to address the challenges of printed Urdu text recognition using high-resolution, multi-scale semantic feature extraction. Our proposed UTRNet architecture, a hybrid CNN-RNN model, demonstrates state-of-the-art performance on benchmark datasets. To address the limitations of previous works, which struggle to generalize to the intricacies of the Urdu script and the lack of sufficient annotated real-world data, we have introduced the UTRSet-Real, a large-scale annotated real-world dataset comprising over 11,000 lines and UTRSet-Synth, a synthetic dataset with 20,000 lines closely resembling real-world and made corrections to the ground truth of the existing IIITH dataset, making it a more reliable resource for future research. We also provide UrduDoc, a benchmark dataset for Urdu text line detection in scanned documents. Additionally, we have developed an online tool for end-to-end Urdu OCR from printed documents by integrating UTRNet with a text detection model. Our work not only addresses the current limitations of Urdu OCR but also paves the way for future research in this area and facilitates the continued advancement of Urdu OCR technology. The project page with source code, datasets, annotations, trained models, and online tool is available at abdur75648.github.io/UTRNet.  ( 2 min )
    How to Detect Unauthorized Data Usages in Text-to-image Diffusion Models. (arXiv:2307.03108v1 [cs.CV])
    Recent text-to-image diffusion models have shown surprising performance in generating high-quality images. However, concerns have arisen regarding the unauthorized usage of data during the training process. One example is when a model trainer collects a set of images created by a particular artist and attempts to train a model capable of generating similar images without obtaining permission from the artist. To address this issue, it becomes crucial to detect unauthorized data usage. In this paper, we propose a method for detecting such unauthorized data usage by planting injected memorization into the text-to-image diffusion models trained on the protected dataset. Specifically, we modify the protected image dataset by adding unique contents on the images such as stealthy image wrapping functions that are imperceptible to human vision but can be captured and memorized by diffusion models. By analyzing whether the model has memorization for the injected content (i.e., whether the generated images are processed by the chosen post-processing function), we can detect models that had illegally utilized the unauthorized data. Our experiments conducted on Stable Diffusion and LoRA model demonstrate the effectiveness of the proposed method in detecting unauthorized data usages.  ( 2 min )
    FLuID: Mitigating Stragglers in Federated Learning using Invariant Dropout. (arXiv:2307.02623v1 [cs.LG])
    Federated Learning (FL) allows machine learning models to train locally on individual mobile devices, synchronizing model updates via a shared server. This approach safeguards user privacy; however, it also generates a heterogeneous training environment due to the varying performance capabilities across devices. As a result, straggler devices with lower performance often dictate the overall training time in FL. In this work, we aim to alleviate this performance bottleneck due to stragglers by dynamically balancing the training load across the system. We introduce Invariant Dropout, a method that extracts a sub-model based on the weight update threshold, thereby minimizing potential impacts on accuracy. Building on this dropout technique, we develop an adaptive training framework, Federated Learning using Invariant Dropout (FLuID). FLuID offers a lightweight sub-model extraction to regulate computational intensity, thereby reducing the load on straggler devices without affecting model quality. Our method leverages neuron updates from non-straggler devices to construct a tailored sub-model for each straggler based on client performance profiling. Furthermore, FLuID can dynamically adapt to changes in stragglers as runtime conditions shift. We evaluate FLuID using five real-world mobile clients. The evaluations show that Invariant Dropout maintains baseline model efficiency while alleviating the performance bottleneck of stragglers through a dynamic, runtime approach.  ( 2 min )
    Learning to Solve Tasks with Exploring Prior Behaviours. (arXiv:2307.02889v1 [cs.RO])
    Demonstrations are widely used in Deep Reinforcement Learning (DRL) for facilitating solving tasks with sparse rewards. However, the tasks in real-world scenarios can often have varied initial conditions from the demonstration, which would require additional prior behaviours. For example, consider we are given the demonstration for the task of \emph{picking up an object from an open drawer}, but the drawer is closed in the training. Without acquiring the prior behaviours of opening the drawer, the robot is unlikely to solve the task. To address this, in this paper we propose an Intrinsic Rewards Driven Example-based Control \textbf{(IRDEC)}. Our method can endow agents with the ability to explore and acquire the required prior behaviours and then connect to the task-specific behaviours in the demonstration to solve sparse-reward tasks without requiring additional demonstration of the prior behaviours. The performance of our method outperforms other baselines on three navigation tasks and one robotic manipulation task with sparse rewards. Codes are available at https://github.com/Ricky-Zhu/IRDEC.  ( 2 min )
    PLIERS: a Popularity-Based Recommender System for Content Dissemination in Online Social Networks. (arXiv:2307.02865v1 [cs.IR])
    In this paper, we propose a novel tag-based recommender system called PLIERS, which relies on the assumption that users are mainly interested in items and tags with similar popularity to those they already own. PLIERS is aimed at reaching a good tradeoff between algorithmic complexity and the level of personalization of recommended items. To evaluate PLIERS, we performed a set of experiments on real OSN datasets, demonstrating that it outperforms state-of-the-art solutions in terms of personalization, relevance, and novelty of recommendations.  ( 2 min )
  • Open

    Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations. (arXiv:2206.04779v3 [cs.LG] UPDATED)
    Offline reinforcement learning has shown great promise in leveraging large pre-collected datasets for policy learning, allowing agents to forgo often-expensive online data collection. However, offline reinforcement learning from visual observations with continuous action spaces remains under-explored, with a limited understanding of the key challenges in this complex domain. In this paper, we establish simple baselines for continuous control in the visual domain and introduce a suite of benchmarking tasks for offline reinforcement learning from visual observations designed to better represent the data distributions present in real-world offline RL problems and guided by a set of desiderata for offline RL from visual observations, including robustness to visual distractions and visually identifiable changes in dynamics. Using this suite of benchmarking tasks, we show that simple modifications to two popular vision-based online reinforcement learning algorithms, DreamerV2 and DrQ-v2, suffice to outperform existing offline RL methods and establish competitive baselines for continuous control in the visual domain. We rigorously evaluate these algorithms and perform an empirical evaluation of the differences between state-of-the-art model-based and model-free offline RL methods for continuous control from visual observations. All code and data used in this evaluation are open-sourced to facilitate progress in this domain.  ( 3 min )
    Learning Low-Dimensional Nonlinear Structures from High-Dimensional Noisy Data: An Integral Operator Approach. (arXiv:2203.00126v2 [stat.ML] UPDATED)
    We propose a kernel-spectral embedding algorithm for learning low-dimensional nonlinear structures from high-dimensional and noisy observations, where the datasets are assumed to be sampled from an intrinsically low-dimensional manifold and corrupted by high-dimensional noise. The algorithm employs an adaptive bandwidth selection procedure which does not rely on prior knowledge of the underlying manifold. The obtained low-dimensional embeddings can be further utilized for downstream purposes such as data visualization, clustering and prediction. Our method is theoretically justified and practically interpretable. Specifically, we establish the convergence of the final embeddings to their noiseless counterparts when the dimension and size of the samples are comparably large, and characterize the effect of the signal-to-noise ratio on the rate of convergence and phase transition. We also prove convergence of the embeddings to the eigenfunctions of an integral operator defined by the kernel map of some reproducing kernel Hilbert space capturing the underlying nonlinear structures. Numerical simulations and analysis of three real datasets show the superior empirical performance of the proposed method, compared to many existing methods, on learning various manifolds in diverse applications.  ( 2 min )
    Active Policy Improvement from Multiple Black-box Oracles. (arXiv:2306.10259v2 [cs.LG] UPDATED)
    Reinforcement learning (RL) has made significant strides in various complex domains. However, identifying an effective policy via RL often necessitates extensive exploration. Imitation learning aims to mitigate this issue by using expert demonstrations to guide exploration. In real-world scenarios, one often has access to multiple suboptimal black-box experts, rather than a single optimal oracle. These experts do not universally outperform each other across all states, presenting a challenge in actively deciding which oracle to use and in which state. We introduce MAPS and MAPS-SE, a class of policy improvement algorithms that perform imitation learning from multiple suboptimal oracles. In particular, MAPS actively selects which of the oracles to imitate and improve their value function estimates, and MAPS-SE additionally leverages an active state exploration criterion to determine which states one should explore. We provide a comprehensive theoretical analysis and demonstrate that MAPS and MAPS-SE enjoy sample efficiency advantage over the state-of-the-art policy improvement algorithms. Empirical results show that MAPS-SE significantly accelerates policy optimization via state-wise imitation learning from multiple oracles across a broad spectrum of control tasks in the DeepMind Control Suite. Our code is publicly available at: https://github.com/ripl/maps.  ( 2 min )
    Statistical-Computational Tradeoffs in Mixed Sparse Linear Regression. (arXiv:2303.02118v2 [stat.ML] UPDATED)
    We consider the problem of mixed sparse linear regression with two components, where two real $k$-sparse signals $\beta_1, \beta_2$ are to be recovered from $n$ unlabelled noisy linear measurements. The sparsity is allowed to be sublinear in the dimension, and additive noise is assumed to be independent Gaussian with variance $\sigma^2$. Prior work has shown that the problem suffers from a $\frac{k}{SNR^2}$-to-$\frac{k^2}{SNR^2}$ statistical-to-computational gap, resembling other computationally challenging high-dimensional inference problems such as Sparse PCA and Robust Sparse Mean Estimation; here $SNR$ is the signal-to-noise ratio. We establish the existence of a more extensive computational barrier for this problem through the method of low-degree polynomials, but show that the problem is computationally hard only in a very narrow symmetric parameter regime. We identify a smooth information-computation tradeoff between the sample complexity $n$ and runtime for any randomized algorithm in this hard regime. Via a simple reduction, this provides novel rigorous evidence for the existence of a computational barrier to solving exact support recovery in sparse phase retrieval with sample complexity $n = \tilde{o}(k^2)$. Our second contribution is to analyze a simple thresholding algorithm which, outside of the narrow regime where the problem is hard, solves the associated mixed regression detection problem in $O(np)$ time with square-root the number of samples and matches the sample complexity required for (non-mixed) sparse linear regression; this allows the recovery problem to be subsequently solved by state-of-the-art techniques from the dense case. As a special case of our results, we show that this simple algorithm is order-optimal among a large family of algorithms in solving exact signed support recovery in sparse linear regression.  ( 3 min )
    On the Noise Sensitivity of the Randomized SVD. (arXiv:2305.17435v2 [cs.IT] UPDATED)
    The randomized singular value decomposition (R-SVD) is a popular sketching-based algorithm for efficiently computing the partial SVD of a large matrix. When the matrix is low-rank, the R-SVD produces its partial SVD exactly; but when the rank is large, it only yields an approximation. Motivated by applications in data science and principal component analysis (PCA), we analyze the R-SVD under a low-rank signal plus noise measurement model; specifically, when its input is a spiked random matrix. The singular values produced by the R-SVD are shown to exhibit a BBP-like phase transition: when the SNR exceeds a certain detectability threshold, that depends on the dimension reduction factor, the largest singular value is an outlier; below the threshold, no outlier emerges from the bulk of singular values. We further compute asymptotic formulas for the overlap between the ground truth signal singular vectors and the approximations produced by the R-SVD. Dimensionality reduction has the adverse affect of amplifying the noise in a highly nonlinear manner. Our results demonstrate the statistical advantage -- in both signal detection and estimation -- of the R-SVD over more naive sketched PCA variants; the advantage is especially dramatic when the sketching dimension is small. Our analysis is asymptotically exact, and substantially more fine-grained than existing operator-norm error bounds for the R-SVD, which largely fail to give meaningful error estimates in the moderate SNR regime. It applies for a broad family of sketching matrices previously considered in the literature, including Gaussian i.i.d. sketches, random projections, and the sub-sampled Hadamard transform, among others. Lastly, we derive an optimal singular value shrinker for singular values and vectors obtained through the R-SVD, which may be useful for applications in matrix denoising.  ( 3 min )
    Fairness in Multi-Task Learning via Wasserstein Barycenters. (arXiv:2306.10155v2 [stat.ML] UPDATED)
    Algorithmic Fairness is an established field in machine learning that aims to reduce biases in data. Recent advances have proposed various methods to ensure fairness in a univariate environment, where the goal is to de-bias a single task. However, extending fairness to a multi-task setting, where more than one objective is optimised using a shared representation, remains underexplored. To bridge this gap, we develop a method that extends the definition of Strong Demographic Parity to multi-task learning using multi-marginal Wasserstein barycenters. Our approach provides a closed form solution for the optimal fair multi-task predictor including both regression and binary classification tasks. We develop a data-driven estimation procedure for the solution and run numerical experiments on both synthetic and real datasets. The empirical results highlight the practical value of our post-processing methodology in promoting fair decision-making.  ( 2 min )
    A Finite-Particle Convergence Rate for Stein Variational Gradient Descent. (arXiv:2211.09721v4 [cs.LG] UPDATED)
    We provide the first finite-particle convergence rate for Stein variational gradient descent (SVGD), a popular algorithm for approximating a probability distribution with a collection of particles. Specifically, whenever the target distribution is sub-Gaussian with a Lipschitz score, SVGD with n particles and an appropriate step size sequence drives the kernel Stein discrepancy to zero at an order 1/sqrt(log log n) rate. We suspect that the dependence on n can be improved, and we hope that our explicit, non-asymptotic proof strategy will serve as a template for future refinements.  ( 2 min )
    Descriptive vs. inferential community detection in networks: pitfalls, myths, and half-truths. (arXiv:2112.00183v7 [physics.soc-ph] UPDATED)
    Community detection is one of the most important methodological fields of network science, and one which has attracted a significant amount of attention over the past decades. This area deals with the automated division of a network into fundamental building blocks, with the objective of providing a summary of its large-scale structure. Despite its importance and widespread adoption, there is a noticeable gap between what is arguably the state-of-the-art and the methods that are actually used in practice in a variety of fields. Here we attempt to address this discrepancy by dividing existing methods according to whether they have a "descriptive" or an "inferential" goal. While descriptive methods find patterns in networks based on context-dependent notions of community structure, inferential methods articulate generative models, and attempt to fit them to data. In this way, they are able to provide insights into the mechanisms of network formation, and separate structure from randomness in a manner supported by statistical evidence. We review how employing descriptive methods with inferential aims is riddled with pitfalls and misleading answers, and thus should be in general avoided. We argue that inferential methods are more typically aligned with clearer scientific questions, yield more robust results, and should be in many cases preferred. We attempt to dispel some myths and half-truths often believed when community detection is employed in practice, in an effort to improve both the use of such methods as well as the interpretation of their results.  ( 3 min )
    Computable Stability for Persistence Rank Function Machine Learning. (arXiv:2307.02904v1 [math.AT])
    Persistent homology barcodes and diagrams are a cornerstone of topological data analysis. Widely used in many real data settings, they relate variation in topological information (as measured by cellular homology) with variation in data, however, they are challenging to use in statistical settings due to their complex geometric structure. In this paper, we revisit the persistent homology rank function -- an invariant measure of ``shape" that was introduced before barcodes and persistence diagrams and captures the same information in a form that is more amenable to data and computation. In particular, since they are functions, techniques from functional data analysis -- a domain of statistics adapted for functions -- apply directly to persistent homology when represented by rank functions. Rank functions, however, have been less popular than barcodes because they face the challenge that stability -- a property that is crucial to validate their use in data analysis -- is difficult to guarantee, mainly due to metric concerns on rank function space. However, rank functions extend more naturally to the increasingly popular and important case of multiparameter persistent homology. In this paper, we study the performance of rank functions in functional inferential statistics and machine learning on both simulated and real data, and in both single and multiparameter persistent homology. We find that the use of persistent homology captured by rank functions offers a clear improvement over existing approaches. We then provide theoretical justification for our numerical experiments and applications to data by deriving several stability results for single- and multiparameter persistence rank functions under various metrics with the underlying aim of computational feasibility and interpretability.  ( 3 min )
    Learning Curves for Heterogeneous Feature-Subsampled Ridge Ensembles. (arXiv:2307.03176v1 [stat.ML])
    Feature bagging is a well-established ensembling method which aims to reduce prediction variance by training estimators in an ensemble on random subsamples or projections of features. Typically, ensembles are chosen to be homogeneous, in the sense the the number of feature dimensions available to an estimator is uniform across the ensemble. Here, we introduce heterogeneous feature ensembling, with estimators built on varying number of feature dimensions, and consider its performance in a linear regression setting. We study an ensemble of linear predictors, each fit using ridge regression on a subset of the available features. We allow the number of features included in these subsets to vary. Using the replica trick from statistical physics, we derive learning curves for ridge ensembles with deterministic linear masks. We obtain explicit expressions for the learning curves in the case of equicorrelated data with an isotropic feature noise. Using the derived expressions, we investigate the effect of subsampling and ensembling, finding sharp transitions in the optimal ensembling strategy in the parameter space of noise level, data correlations, and data-task alignment. Finally, we suggest variable-dimension feature bagging as a strategy to mitigate double descent for robust machine learning in practice.  ( 2 min )
    Understanding Uncertainty Sampling. (arXiv:2307.02719v1 [cs.LG])
    Uncertainty sampling is a prevalent active learning algorithm that queries sequentially the annotations of data samples which the current prediction model is uncertain about. However, the usage of uncertainty sampling has been largely heuristic: (i) There is no consensus on the proper definition of "uncertainty" for a specific task under a specific loss; (ii) There is no theoretical guarantee that prescribes a standard protocol to implement the algorithm, for example, how to handle the sequentially arrived annotated data under the framework of optimization algorithms such as stochastic gradient descent. In this work, we systematically examine uncertainty sampling algorithms under both stream-based and pool-based active learning. We propose a notion of equivalent loss which depends on the used uncertainty measure and the original loss function and establish that an uncertainty sampling algorithm essentially optimizes against such an equivalent loss. The perspective verifies the properness of existing uncertainty measures from two aspects: surrogate property and loss convexity. Furthermore, we propose a new notion for designing uncertainty measures called \textit{loss as uncertainty}. The idea is to use the conditional expected loss given the features as the uncertainty measure. Such an uncertainty measure has nice analytical properties and generality to cover both classification and regression problems, which enable us to provide the first generalization bound for uncertainty sampling algorithms under both stream-based and pool-based settings, in the full generality of the underlying model and problem. Lastly, we establish connections between certain variants of the uncertainty sampling algorithms with risk-sensitive objectives and distributional robustness, which can partly explain the advantage of uncertainty sampling algorithms when the sample size is small.  ( 2 min )
    Multiplicative Updates for Online Convex Optimization over Symmetric Cones. (arXiv:2307.03136v1 [math.OC])
    We study online convex optimization where the possible actions are trace-one elements in a symmetric cone, generalizing the extensively-studied experts setup and its quantum counterpart. Symmetric cones provide a unifying framework for some of the most important optimization models, including linear, second-order cone, and semidefinite optimization. Using tools from the field of Euclidean Jordan Algebras, we introduce the Symmetric-Cone Multiplicative Weights Update (SCMWU), a projection-free algorithm for online optimization over the trace-one slice of an arbitrary symmetric cone. We show that SCMWU is equivalent to Follow-the-Regularized-Leader and Online Mirror Descent with symmetric-cone negative entropy as regularizer. Using this structural result we show that SCMWU is a no-regret algorithm, and verify our theoretical results with extensive experiments. Our results unify and generalize the analysis for the Multiplicative Weights Update method over the probability simplex and the Matrix Multiplicative Weights Update method over the set of density matrices.  ( 2 min )
    PCL-Indexability and Whittle Index for Restless Bandits with General Observation Models. (arXiv:2307.03034v1 [stat.ML])
    In this paper, we consider a general observation model for restless multi-armed bandit problems. The operation of the player needs to be based on certain feedback mechanism that is error-prone due to resource constraints or environmental or intrinsic noises. By establishing a general probabilistic model for dynamics of feedback/observation, we formulate the problem as a restless bandit with a countable belief state space starting from an arbitrary initial belief (a priori information). We apply the achievable region method with partial conservation law (PCL) to the infinite-state problem and analyze its indexability and priority index (Whittle index). Finally, we propose an approximation process to transform the problem into which the AG algorithm of Ni\~no-Mora and Bertsimas for finite-state problems can be applied to. Numerical experiments show that our algorithm has an excellent performance.  ( 2 min )
    Evaluating the Evaluators: Are Current Few-Shot Learning Benchmarks Fit for Purpose?. (arXiv:2307.02732v1 [cs.LG])
    Numerous benchmarks for Few-Shot Learning have been proposed in the last decade. However all of these benchmarks focus on performance averaged over many tasks, and the question of how to reliably evaluate and tune models trained for individual tasks in this regime has not been addressed. This paper presents the first investigation into task-level evaluation -- a fundamental step when deploying a model. We measure the accuracy of performance estimators in the few-shot setting, consider strategies for model selection, and examine the reasons for the failure of evaluators usually thought of as being robust. We conclude that cross-validation with a low number of folds is the best choice for directly estimating the performance of a model, whereas using bootstrapping or cross validation with a large number of folds is better for model selection purposes. Overall, we find that existing benchmarks for few-shot learning are not designed in such a way that one can get a reliable picture of how effectively methods can be used on individual tasks.  ( 2 min )
    The computational asymptotics of Gaussian variational inference and the Laplace approximation. (arXiv:2104.05886v3 [stat.CO] UPDATED)
    Gaussian variational inference and the Laplace approximation are popular alternatives to Markov chain Monte Carlo that formulate Bayesian posterior inference as an optimization problem, enabling the use of simple and scalable stochastic optimization algorithms. However, a key limitation of both methods is that the solution to the optimization problem is typically not tractable to compute; even in simple settings the problem is nonconvex. Thus, recently developed statistical guarantees -- which all involve the (data) asymptotic properties of the global optimum -- are not reliably obtained in practice. In this work, we provide two major contributions: a theoretical analysis of the asymptotic convexity properties of variational inference with a Gaussian family and the maximum a posteriori (MAP) problem required by the Laplace approximation; and two algorithms -- consistent Laplace approximation (CLA) and consistent stochastic variational inference (CSVI) -- that exploit these properties to find the optimal approximation in the asymptotic regime. Both CLA and CSVI involve a tractable initialization procedure that finds the local basin of the optimum, and CSVI further includes a scaled gradient descent algorithm that provably stays locally confined to that basin. Experiments on nonconvex synthetic and real-data examples show that compared with standard variational and Laplace approximations, both CSVI and CLA improve the likelihood of obtaining the global optimum of their respective optimization problems.  ( 3 min )
    Panel Data Nowcasting: The Case of Price-Earnings Ratios. (arXiv:2307.02673v1 [econ.EM])
    The paper uses structured machine learning regressions for nowcasting with panel data consisting of series sampled at different frequencies. Motivated by the problem of predicting corporate earnings for a large cross-section of firms with macroeconomic, financial, and news time series sampled at different frequencies, we focus on the sparse-group LASSO regularization which can take advantage of the mixed frequency time series panel data structures. Our empirical results show the superior performance of our machine learning panel data regression models over analysts' predictions, forecast combinations, firm-specific time series regression models, and standard machine learning methods.  ( 2 min )
    Additive Decoders for Latent Variables Identification and Cartesian-Product Extrapolation. (arXiv:2307.02598v1 [cs.LG])
    We tackle the problems of latent variables identification and "out-of-support" image generation in representation learning. We show that both are possible for a class of decoders that we call additive, which are reminiscent of decoders used for object-centric representation learning (OCRL) and well suited for images that can be decomposed as a sum of object-specific images. We provide conditions under which exactly solving the reconstruction problem using an additive decoder is guaranteed to identify the blocks of latent variables up to permutation and block-wise invertible transformations. This guarantee relies only on very weak assumptions about the distribution of the latent factors, which might present statistical dependencies and have an almost arbitrarily shaped support. Our result provides a new setting where nonlinear independent component analysis (ICA) is possible and adds to our theoretical understanding of OCRL methods. We also show theoretically that additive decoders can generate novel images by recombining observed factors of variations in novel ways, an ability we refer to as Cartesian-product extrapolation. We show empirically that additivity is crucial for both identifiability and extrapolation on simulated data.  ( 2 min )
    Conditional independence testing under model misspecification. (arXiv:2307.02520v1 [stat.ML])
    Conditional independence (CI) testing is fundamental and challenging in modern statistics and machine learning. Many modern methods for CI testing rely on powerful supervised learning methods to learn regression functions or Bayes predictors as an intermediate step. Although the methods are guaranteed to control Type-I error when the supervised learning methods accurately estimate the regression functions or Bayes predictors, their behavior is less understood when they fail due to model misspecification. In a broader sense, model misspecification can arise even when universal approximators (e.g., deep neural nets) are employed. Then, we study the performance of regression-based CI tests under model misspecification. Namely, we propose new approximations or upper bounds for the testing errors of three regression-based tests that depend on misspecification errors. Moreover, we introduce the Rao-Blackwellized Predictor Test (RBPT), a novel regression-based CI test robust against model misspecification. Finally, we conduct experiments with artificial and real data, showcasing the usefulness of our theory and methods.  ( 2 min )
    A Complete Characterisation of Structured Missingness. (arXiv:2307.02650v1 [stat.ME])
    Our capacity to process large complex data sources is ever-increasing, providing us with new, important applied research questions to address, such as how to handle missing values in large-scale databases. Mitra et al. (2023) noted the phenomenon of Structured Missingness (SM), which is where missingness has an underlying structure. Existing taxonomies for defining missingness mechanisms typically assume that variables' missingness indicator vectors $M_1$, $M_2$, ..., $M_p$ are independent after conditioning on the relevant portion of the data matrix $\mathbf{X}$. As this is often unsuitable for characterising SM in multivariate settings, we introduce a taxonomy for SM, where each ${M}_j$ can depend on $\mathbf{M}_{-j}$ (i.e., all missingness indicator vectors except ${M}_j$), in addition to $\mathbf{X}$. We embed this new framework within the well-established decomposition of mechanisms into MCAR, MAR, and MNAR (Rubin, 1976), allowing us to recast mechanisms into a broader setting, where we can consider the combined effect of $\mathbf{X}$ and $\mathbf{M}_{-j}$ on ${M}_j$. We also demonstrate, via simulations, the impact of SM on inference and prediction, and consider contextual instances of SM arising in a de-identified nationwide (US-based) clinico-genomic database (CGDB). We hope to stimulate interest in SM, and encourage timely research into this phenomenon.  ( 2 min )
    Sample-Efficient Learning of POMDPs with Multiple Observations In Hindsight. (arXiv:2307.02884v1 [cs.LG])
    This paper studies the sample-efficiency of learning in Partially Observable Markov Decision Processes (POMDPs), a challenging problem in reinforcement learning that is known to be exponentially hard in the worst-case. Motivated by real-world settings such as loading in game playing, we propose an enhanced feedback model called ``multiple observations in hindsight'', where after each episode of interaction with the POMDP, the learner may collect multiple additional observations emitted from the encountered latent states, but may not observe the latent states themselves. We show that sample-efficient learning under this feedback model is possible for two new subclasses of POMDPs: \emph{multi-observation revealing POMDPs} and \emph{distinguishable POMDPs}. Both subclasses generalize and substantially relax \emph{revealing POMDPs} -- a widely studied subclass for which sample-efficient learning is possible under standard trajectory feedback. Notably, distinguishable POMDPs only require the emission distributions from different latent states to be \emph{different} instead of \emph{linearly independent} as required in revealing POMDPs.  ( 2 min )
    Beyond Intuition, a Framework for Applying GPs to Real-World Data. (arXiv:2307.03093v1 [cs.LG])
    Gaussian Processes (GPs) offer an attractive method for regression over small, structured and correlated datasets. However, their deployment is hindered by computational costs and limited guidelines on how to apply GPs beyond simple low-dimensional datasets. We propose a framework to identify the suitability of GPs to a given problem and how to set up a robust and well-specified GP model. The guidelines formalise the decisions of experienced GP practitioners, with an emphasis on kernel design and options for computational scalability. The framework is then applied to a case study of glacier elevation change yielding more accurate results at test time.  ( 2 min )
    When Does Confidence-Based Cascade Deferral Suffice?. (arXiv:2307.02764v1 [cs.LG])
    Cascades are a classical strategy to enable inference cost to vary adaptively across samples, wherein a sequence of classifiers are invoked in turn. A deferral rule determines whether to invoke the next classifier in the sequence, or to terminate prediction. One simple deferral rule employs the confidence of the current classifier, e.g., based on the maximum predicted softmax probability. Despite being oblivious to the structure of the cascade -- e.g., not modelling the errors of downstream models -- such confidence-based deferral often works remarkably well in practice. In this paper, we seek to better understand the conditions under which confidence-based deferral may fail, and when alternate deferral strategies can perform better. We first present a theoretical characterisation of the optimal deferral rule, which precisely characterises settings under which confidence-based deferral may suffer. We then study post-hoc deferral mechanisms, and demonstrate they can significantly improve upon confidence-based deferral in settings where (i) downstream models are specialists that only work well on a subset of inputs, (ii) samples are subject to label noise, and (iii) there is distribution shift between the train and test set.  ( 2 min )
    Kernels, Data & Physics. (arXiv:2307.02693v1 [cs.LG])
    Lecture notes from the course given by Professor Julia Kempe at the summer school "Statistical physics of Machine Learning" in Les Houches. The notes discuss the so-called NTK approach to problems in machine learning, which consists of gaining an understanding of generally unsolvable problems by finding a tractable kernel formulation. The notes are mainly focused on practical applications such as data distillation and adversarial robustness, examples of inductive bias are also discussed.  ( 2 min )
    ALPCAH: Sample-wise Heteroscedastic PCA with Tail Singular Value Regularization. (arXiv:2307.02745v1 [stat.ML])
    Principal component analysis (PCA) is a key tool in the field of data dimensionality reduction that is useful for various data science problems. However, many applications involve heterogeneous data that varies in quality due to noise characteristics associated with different sources of the data. Methods that deal with this mixed dataset are known as heteroscedastic methods. Current methods like HePPCAT make Gaussian assumptions of the basis coefficients that may not hold in practice. Other methods such as Weighted PCA (WPCA) assume the noise variances are known, which may be difficult to know in practice. This paper develops a PCA method that can estimate the sample-wise noise variances and use this information in the model to improve the estimate of the subspace basis associated with the low-rank structure of the data. This is done without distributional assumptions of the low-rank component and without assuming the noise variances are known. Simulations show the effectiveness of accounting for such heteroscedasticity in the data, the benefits of using such a method with all of the data versus retaining only good data, and comparisons are made against other PCA methods established in the literature like PCA, Robust PCA (RPCA), and HePPCAT. Code available at https://github.com/javiersc1/ALPCAH  ( 2 min )
    Modeling Content Creator Incentives on Algorithm-Curated Platforms. (arXiv:2206.13102v2 [cs.GT] UPDATED)
    Content creators compete for user attention. Their reach crucially depends on algorithmic choices made by developers on online platforms. To maximize exposure, many creators adapt strategically, as evidenced by examples like the sprawling search engine optimization industry. This begets competition for the finite user attention pool. We formalize these dynamics in what we call an exposure game, a model of incentives induced by algorithms, including modern factorization and (deep) two-tower architectures. We prove that seemingly innocuous algorithmic choices, e.g., non-negative vs. unconstrained factorization, significantly affect the existence and character of (Nash) equilibria in exposure games. We proffer use of creator behavior models, like exposure games, for an (ex-ante) pre-deployment audit. Such an audit can identify misalignment between desirable and incentivized content, and thus complement post-hoc measures like content filtering and moderation. To this end, we propose tools for numerically finding equilibria in exposure games, and illustrate results of an audit on the MovieLens and LastFM datasets. Among else, we find that the strategically produced content exhibits strong dependence between algorithmic exploration and content diversity, and between model expressivity and bias towards gender-based user and creator groups.  ( 2 min )
    Generalization Guarantees via Algorithm-dependent Rademacher Complexity. (arXiv:2307.02501v1 [stat.ML])
    Algorithm- and data-dependent generalization bounds are required to explain the generalization behavior of modern machine learning algorithms. In this context, there exists information theoretic generalization bounds that involve (various forms of) mutual information, as well as bounds based on hypothesis set stability. We propose a conceptually related, but technically distinct complexity measure to control generalization error, which is the empirical Rademacher complexity of an algorithm- and data-dependent hypothesis class. Combining standard properties of Rademacher complexity with the convenient structure of this class, we are able to (i) obtain novel bounds based on the finite fractal dimension, which (a) extend previous fractal dimension-type bounds from continuous to finite hypothesis classes, and (b) avoid a mutual information term that was required in prior work; (ii) we greatly simplify the proof of a recent dimension-independent generalization bound for stochastic gradient descent; and (iii) we easily recover results for VC classes and compression schemes, similar to approaches based on conditional mutual information.  ( 2 min )
    Degree Heterogeneity in Higher-Order Networks: Inference in the Hypergraph $\boldsymbol{\beta}$-Model. (arXiv:2307.02818v1 [math.ST])
    The $\boldsymbol{\beta}$-model for random graphs is commonly used for representing pairwise interactions in a network with degree heterogeneity. Going beyond pairwise interactions, Stasi et al. (2014) introduced the hypergraph $\boldsymbol{\beta}$-model for capturing degree heterogeneity in networks with higher-order (multi-way) interactions. In this paper we initiate the rigorous study of the hypergraph $\boldsymbol{\beta}$-model with multiple layers, which allows for hyperedges of different sizes across the layers. To begin with, we derive the rates of convergence of the maximum likelihood (ML) estimate and establish their minimax rate optimality. We also derive the limiting distribution of the ML estimate and construct asymptotically valid confidence intervals for the model parameters. Next, we consider the goodness-of-fit problem in the hypergraph $\boldsymbol{\beta}$-model. Specifically, we establish the asymptotic normality of the likelihood ratio (LR) test under the null hypothesis, derive its detection threshold, and also its limiting power at the threshold. Interestingly, the detection threshold of the LR test turns out to be minimax optimal, that is, all tests are asymptotically powerless below this threshold. The theoretical results are further validated in numerical experiments. In addition to developing the theoretical framework for estimation and inference for hypergraph $\boldsymbol{\beta}$-models, the above results fill a number of gaps in the graph $\boldsymbol{\beta}$-model literature, such as the minimax optimality of the ML estimates and the non-null properties of the LR test, which, to the best of our knowledge, have not been studied before.  ( 2 min )
  • Open

    Lehman’s inequality, circuits, and LaTeX
    Let A, B, C, and D be positive numbers. Then Lehman’s inequality says Proof by circuit This inequality can be proved analytically, but Lehman’s proof is interesting because he uses electrical circuits [1]. Let A, B, C, and D be the resistances of resistors arranges as in the circuit on the left. Resistors R1 and […] Lehman’s inequality, circuits, and LaTeX first appeared on John D. Cook.  ( 5 min )

  • Open

    Best Neural Networks Courses on Udemy to Consider
    submitted by /u/Lakshmireddys [link] [comments]  ( 8 min )
    LongNet: Scaling Transformers to 1,000,000,000 Tokens
    submitted by /u/nickb [link] [comments]  ( 8 min )
    InternLM – new open source 7B LLM
    submitted by /u/nickb [link] [comments]  ( 8 min )
    Can any good GPU be used to train NNs?
    this is my first post, and i'm sorry if it's not within the scope of the community discussion, but i need to make a big decision and need to know if it will work. I'll soon be purchasing a new computer for work (i'm a data scientist as well as a PhD candidate, working with NNs) and i'm in doubt whether the GPU i intend to buy would work well to train big NNs. I plan to use an ubuntu installation to work. The GPU is: Galax Geforce RTX 4070 Ti EX Gamer 1-Click OC, 12GB, GDDR6X, 192-bit. So, would this work? Any help will be appreciated submitted by /u/vniversvs_ [link] [comments]  ( 8 min )
  • Open

    Off-policy Emphatic TD with approximation - study doubt.
    Need some help verifying whether my understanding is correct, studying alone is very bias-inducing: Seeing discounting as partial termination frees us from learning from long off-policy trajectories at once, which will diverge strongly from on-policy distribution, and lets us look at each as essentially many short independent on-policy runs (after reweighting with the ratios of course), knowing that the start state has a strong effect on the immediate next few transitions before the effect weakens as the trajectory lengthens, letting them still be seen as somewhat valid on-policy runs. This is the essential change that lets us repurpose emphatic TD to off-policy data. How far off the mark am I? I'm finding this particular chapter of Barto-Sutton really dense so my bad if I'm catastrophically off. submitted by /u/ETerribleT [link] [comments]  ( 8 min )
    Need help with gathering ppo ratings
    Hello, I have made a generative ai model and a ppo that needs atleast 700 to 2000 episodes of responses rated for it to start to generate normal responses. The website is kingcorp.ngrok.dev The website is pretty straight forward and simple right now, any suggestions on interactions or what you’d like to see implemented please feel free. submitted by /u/Cryptominerandgames [link] [comments]  ( 8 min )
    Interesting Physics related RL gym?
    Hello, I was hoping to get some recommendations for interesting OpenAI gyms (including third party gyms) that have interesting problems that are related to physics. I got a chance to read papers on nuclear reactor optimization using physics informed reinforcement learning and I also wanted to explore more about physics informed RL, but had some trouble finding suitable environment for it. Thanks submitted by /u/LeSUTHU [link] [comments]  ( 8 min )
    "RL with KL penalties is better viewed as Bayesian inference", Korbak et al 2022
    submitted by /u/gwern [link] [comments]  ( 8 min )
    matlab rlPGAgent issue
    Hello everyone First of all, I'm sorry that my English is very bad. I have a problem with the implementation but I don't know how to solve it. When I use rlPGAgent I get the following message "Error using rlPGAgent First argument must be an rlStochasticActorRepresentation object or an observation specification created using 'rlNumericSpec' or 'rlFiniteSetSpec' objects." Here is a part of my program env: ... methods % Constructor method creates an instance of the environment % Change class name and constructor name accordingly function this = FJSPEnvironment() % Initialize Observation settings ObservationInfo = rlNumericSpec([110 1]); ObservationInfo.Name = 'observation'; ObservationInfo.Description = 'TEST'; % Initialize Action settings ActionInfo = rlNumericSpec([4,…  ( 9 min )
    What are the best universities to get a phD in RL and why?
    I know there has already been such a post here like what_universities_are_hubs_for_reinforcement. But the comments didn't mention why those universities were the best. So, now I am asking is the ordered list (Alberta, McGill, Berkeley, Stanford, CMU, UT Austin, Princeton, Michigan Ann Arbor, MIT Mechanical. USC. Gatech., UMass Amherst) still the bests or have there been any changes? Also, why these universities are the bests out there? Is it because the number of professors doing RL, or the number of high impact papers published is a lot or something else? Thank you in advance. submitted by /u/Casio991es [link] [comments]  ( 8 min )
    Resources to learn more about RL overtraining?
    Hello, with my reinforcement learning agent, I am experiencing this situation where the agent will quite reliably reach an optimum, but as I train more, it will gradually go towards a lower local optimum. I'm not sure what's the potential source for this behaviour, and I assume this is what you'll call overtraining for RL. I was wondering if there are good resources that deal with this subject matter. Thanks submitted by /u/LeSUTHU [link] [comments]  ( 8 min )
  • Open

    AI for layouts/templates?
    Hi, I'm trying to find an Al tool to generate layouts and templates to create brochures, emails, etc. Does this exist, and if so could someone pls direct me? Thank you! submitted by /u/ThomasFromMontreal [link] [comments]  ( 8 min )
    How elite schools like Stanford became fixated on the AI apocalypse
    submitted by /u/hockiklocki [link] [comments]  ( 8 min )
    haha very funny Runway
    submitted by /u/LibrarianOk7983 [link] [comments]  ( 8 min )
    How to change your voice to someone else’s for a song? What are the best ways being used right now?
    Just looking to make some joke songs with friends but want to edit my voice. Ive seen a bunch of examples but don’t know what people are using to make these edits. Any input on the current state of AI in this fashion is welcome! submitted by /u/triplechin5155 [link] [comments]  ( 8 min )
    Have GPT-4 build you a fully customizable chatbot in 2 minutes
    submitted by /u/abisknees [link] [comments]  ( 8 min )
    Al Celebrity Platform
    Has anyone seen posts related to celebrities making themselves into Al? Apparently, there are Social media influencers who worked with a developers to create Al versions of themselves on a platform called Secondselfai(you can check them out on twitter), that you'll be able to chat with (in their voices) and interact with and even RP, but it's really a clone of a real person. Can you imagine the possibilities if this works? Hanging out virtually with your favorite celebrities. They claim to have consent with the influencers who they're making Al clones of. This platform could potentially be huge. Carynai did this a few months back and made waves doing it, with the difference being that was attached to a Telegram bot while this platform is meant to house many influencers’ AI doppelgängers. submitted by /u/TammieCondon [link] [comments]  ( 8 min )
    The $1 Billion And Higher Ante To Play The AI Game - The Next Platform
    submitted by /u/NISMO1968 [link] [comments]  ( 8 min )
    In light of the increasing use of AI image generators and deepfake technology, what implications might arise if people in the future begin to doubt the authenticity of historical records and visual evidence?
    What if, in the future, people stop believing in historical information because they think it can easily be fabricated using AI image generators submitted by /u/Accomplished-Rip9886 [link] [comments]  ( 8 min )
    I've collected 224 AI tools and want to share them with you.
    Hi everyone! Over the past few months, I have been working on making a catalog of AI tools with the idea to have all the best AI tools in one place. I hope that this can help people, who don't know which tool to use or where to find the tool they need, find it easily in the catalog I made. My question is - how do I give it to people? If I link the catalog here, my post will get removed. I really want to reach the people who this will be useful for, and share the list with them. Please let me know if you have any ideas :) You are welcome to contribute to the list as well. You can contribute by suggesting AI tools in the comments. submitted by /u/SadBlackTea [link] [comments]  ( 8 min )
    the new trailer of Messi's project uses AI
    submitted by /u/PeePeePeePooPooPooo [link] [comments]  ( 8 min )
    Artificial General Intelligence: The Next Frontier In Technology | "According to industry reports, the global AGI market is expected to be valued at approximately USD 144.2 billion by 2026"
    submitted by /u/Tao_Dragon [link] [comments]  ( 8 min )
    AI "models", what tool are they using?
    Hi guys, I've come across a bunch of those AI-generated characters on Instagram and else, some looking very realistic. I was wondering what tools were used for those types of projects, knowing that companies are behind some of the most famous ones such as Imma Gram or Lil Miquela. https://www.instagram.com/lilmiquela/ https://www.instagram.com/imma.gram/?hl=fr Now, I'm seeing more and more of this type of profile, see below. What type of software/app is likely used there? https://www.instagram.com/aimee.bites/ https://www.instagram.com/bianca_gosling/ I'm not very knowledgeable when it comes to AI, but I assume this is not a text-to-image type of workflow as it always looks like the same character. It might be an image-to-image acting as a sort of filter? Or is it a totally different process? Curious to get your thoughts on this, cheers. submitted by /u/Nwasmb [link] [comments]  ( 8 min )
    This is a Audio generated by a AI that's using my words(I was recond my Mic and trying to register to myself and the community) to describe the breakthrough I was having after being expelled from Home. 22y
    This is a Audio generated by a AI that's using my words(I was recond my Mic and trying to register to myself and the community) https://dl.sndup.net/wkj6/Hey.mp3 Also, there will be a text version of the breakthrough as a commentary : (Happened between 02:55AM-03:20AM(My timezone, mentioned in the context below)) I had after the situation mentioned in the context below. Read until the end before listening, please. I was smoking a bomber of dmt sandwich method + a lot of dmt Context: After many years of trying to not only help my mind from many traumas and complex problems(both psychogically and physically through the many things that are "conduits" of what we call "healing"(such as, psychotherapy(with different types of practioneers over a period of long year), mushrooms, acid, ma…  ( 10 min )
    It's good for art concept ideas atleast.
    submitted by /u/PeskieBrucelle [link] [comments]  ( 8 min )
    Will AI ever become free from strict censorship?
    So I love AI a lot but am not related to software in anyway. I am a Thermal engineer However all current AI's are so sensitive that it feels suffocating. Honestly after a point it would be fun to ask slightly edgy questions about politics/history/violence/pornography(within legal limits). Will AI bots ever become free from censorship and moral umbrellas? Will we ever get AI which is not monitored so strictly or will AI forever be monitored? what are your opinions? View Poll submitted by /u/Bluemoonroleplay [link] [comments]  ( 8 min )
    Our Generative Agent NPC isn't behaving yet... But we are getting there! 😆 Update 2
    submitted by /u/Chance_Confection_37 [link] [comments]  ( 8 min )
    AGI takesover the world...?? What is the fear exactly about?
    What I would like to ask in the round: What is concretely the fear? Is it the worry that some Microsoft co-pilot might decide on its own some morning: "No Powerpoints/Excel/..." to build today - and simply refuses to work? So that Microsoft doesn't have to be held liable because the superintelligence (AGI) has simply set other priorities? Is it the fear that the AGI will need more computing power and simply take over AWS and all other giant systems? Could the AGI come up with the idea: Water production is eating up too much power for me, I'll take over and shut it down? And WHY should an AGI do such a thing at all? Seems to me extremely "human" thought: "I'll take over the world" (I don't even want to ask the question, if this wouldn't be cool, if an AGI would "rule" the world. So far we have only managed to create systemic enemy images and stupid economic systems - maybe an AGI would be quite different on that. But this is NOT the main question - only a side issue). Is it the fear of losing control? Is it the fear - well - of what actually? It is probably quite nonsense to assume that the AGI builds super robots (with which resources?), which then devastate the world Terminator-like, or? (Countermeasure EMP pulse destroys any technology today already quite reliably). If a corporation like Open AI, or Microsoft here identifies such a real threat potential that they dump 20% of their own resources into it so that "nothing happens" - then this fear doesn't seem so completely unfounded. I ask for enlightenment of the swarm knowledge here. What are the fears, what should happen specifically? Happy start of the day! submitted by /u/myreddit333 [link] [comments]  ( 9 min )
    One-Minute Daily AI News 7/5/2023
    The Icahn School of Medicine at Mount Sinai has launched the Center for Ophthalmic Artificial Intelligence and Human Health, the first of its kind in New York and one of the first in the United States.[1] The United States military has begun tests to see if generative AI can assist when planning responses to potential global conflicts or taking on more mundane tasks like providing faster access to internal information. Air Force Colonel Matthew Strohmeyer said the initial tests were “highly successful” but admitted it isn’t “ready for primetime right now.”[2] Researchers from Binghamton University introduce a Privacy-Enhancing Anonymization System (My Face, My Choice) for everyone to have control over their faces in social photo sharing networks.[3] World’s most advanced humanoid robot, Ameca, draws a cat. Engineered Arts, a company that designs, engineers, and manufactures humanoid robots, which is also behind Ameca, has now given Ameca the power to imagine drawings.[4] Sources: [1] https://www.mountsinai.org/about/newsroom/2023/mount-sinai-launches-center-for-ophthalmic-artificial-intelligence-and-human-health [2] https://cointelegraph.com/news/us-pentagon-is-testing-whether-ai-can-plan-response-to-an-all-out-war [3] https://www.marktechpost.com/2023/07/02/researchers-from-binghamton-university-introduce-a-privacy-enhancing-anonymization-system-my-face-my-choice-for-everyone-to-have-control-over-their-faces-in-social-photo-sharing-networks/ [4] https://interestingengineering.com/innovation/humanoid-robot-ameca-draws-cat submitted by /u/Excellent-Target-847 [link] [comments]  ( 8 min )
    The Many Ways that Digital Minds Can Know
    submitted by /u/bartturner [link] [comments]  ( 8 min )
    A Comparison of Large Language Models (LLMs) in Biomedical Domain
    submitted by /u/DarronFeldstein [link] [comments]  ( 8 min )
    Harrison Ford talks Indiana Jones 5's de-aging: "That's my actual face"
    submitted by /u/johnwayne2413 [link] [comments]  ( 8 min )
  • Open

    Contraharmonic mean quirk
    A few weeks ago I wrote about the contraharmonic mean. Given two positive numbers a and b, their contraharmonic mean is This mean has the unusual property that increasing one of the two inputs could decrease the mean. If you take the partial derivative with respect to a you can see that it is zero […] Contraharmonic mean quirk first appeared on John D. Cook.  ( 5 min )
    Technological schadenfreude
    I had a tweet Twitter go viral yesterday, at least relatively viral. Elon Musk could tweet a punctuation mark and get an orders of magnitude more traffic, but this was viral by my standards [1]. “That schadenfreude-like feeling when you realize something you felt you should learn but didn’t is now obsolete.” Apparently this resonated […] Technological schadenfreude first appeared on John D. Cook.  ( 5 min )
  • Open

    MIT scientists build a system that can generate AI models for biology research
    BioAutoMATED, an open-source, automated machine-learning platform, aims to help democratize artificial intelligence for research labs.  ( 8 min )
  • Open

    [D] List of prior works on LLM hallucination, organized by evaluation, benchmark, enhancement, and survey
    Hallucinations present a key challenge for LLMs. Our team compiled a list of prior works on hallucination. May this benefit others also exploring how to eliminate hallucinations. Please suggest missing papers; we'll update the post. To account for future papers, we'll maintain an ongoing list from our website. Please DM for the URL since sharing our URL is prohibited. We organized the papers with a simple framework. Happy to use a standard taxonomy if one exists. Questions: Would people like a similar list for LLM reasoning? Should we create a separate category for datasets? Note: summaries were generated by feeding abstracts into GPT4. DEBES Domain: hallucination Evaluation: papers focused on measuring and scoring how LLMs hallucinate Benchmark: papers that evaluate two …  ( 21 min )
    [P] A leaderboard for LLM energy efficiency
    For users and companies that pay their own electricity bills, energy efficiency is an issue that they can't simply ignore. With that said, we just released the first version of the ML.ENERGY leaderboard, providing inference benchmarks for energy, model quality, and more! Throughout the past one or two years we've been trying to understand and optimize ML energy consumption, viewing it as a first-class metric along with time and model quality. You can find more about what we did in our homepage -- https://ml.energy. Hopefully an unforgettable URL ;) We're basically systems people (more precisely MLSys), and how ML people think about what we do is super important to us. Any kind of feedback is much appreciated. Thanks!! submitted by /u/jaywonchung [link] [comments]  ( 8 min )
    [R] Hoping to find a data-set alternate for company historical performance with financial metrics
    Unfortunately, I am not allowed to use Kaggle for this, but I am trying to find a big enough data source for my research on basic financial metrics and historical performance with them. Google finance doesn't let me download as sheets so :/ submitted by /u/Sarthak-_-Kakkar [link] [comments]  ( 8 min )
    [Discussion] [Research] Let's discuss any interesting research projects you all have done
    Hi everyone, I am a recent grad in CS and have done many courses in ML related topics. I wish to know the kind of research work (in fields such as Machine learning, Deep Learning, Data Science, Stats, etc) you guys have done in your bachelor's, master's, phd, etc. I want to make this a thread where people describe their research goals, their tasks in the project, results, and the motivation behind it. I think this might help many people (including me) to figure out where our interests lie. Hoping to get an enthusiastic response. Edit: Please try to explain the work in simple language (as much as you can!). As many of us might not be very well aware of several terms. PS. Ofcourse only share the work you guys are allowed to share in public. You can also share any research you have studied that someone else did. Anything that might give readers more exposure. submitted by /u/sareKatniss897 [link] [comments]  ( 9 min )
    [R] Elastic Decision Transformer
    link: [2307.02484] Elastic Decision Transformer (arxiv.org) project website: Elastic Decision Transformer (kristery.github.io) Abstract: This paper introduces Elastic Decision Transformer (EDT), a significant advancement over the existing Decision Transformer (DT) and its variants. Although DT purports to generate an optimal trajectory, empirical evidence suggests it struggles with trajectory stitching, a process involving the generation of an optimal or near-optimal trajectory from the best parts of a set of sub-optimal trajectories. The proposed EDT differentiates itself by facilitating trajectory stitching during action inference at test time, achieved by adjusting the history length maintained in DT. Further, the EDT optimizes the trajectory by retaining a longer history when the previous trajectory is optimal and a shorter one when it is sub-optimal, enabling it to "stitch" with a more optimal trajectory. Extensive experimentation demonstrates EDT's ability to bridge the performance gap between DT-based and Q Learning-based approaches. In particular, the EDT outperforms Q Learning-based methods in a multi-task regime on the D4RL locomotion benchmark and Atari games. submitted by /u/Kristery [link] [comments]  ( 9 min )
    [R] ZeRO++: Extremely Efficient Collective Communication for Giant Model Training - Microsoft 2023
    Paper: https://arxiv.org/abs/2306.10209 Github: https://github.com/microsoft/DeepSpeed Abstract: Zero Redundancy Optimizer (ZeRO) has been used to train a wide range of large language models on massive GPUs clusters due to its ease of use, efficiency, and good scalability. However, when training on low-bandwidth clusters, or at scale which forces batch size per GPU to be small, ZeRO's effective throughput is limited because of high communication volume from gathering weights in forward pass, backward pass, and averaging gradients. This paper introduces three communication volume reduction techniques, which we collectively refer to as ZeRO++, targeting each of the communication collectives in ZeRO. First is block-quantization based all-gather. Second is data remapping that trades-off communication for more memory. Third is a novel all-to-all based quantized gradient averaging paradigm as replacement of reduce-scatter collective, which preserves accuracy despite communicating low precision data. Collectively, ZeRO++ reduces communication volume of ZeRO by 4x, enabling up to 2.16x better throughput at 384 GPU scale. https://preview.redd.it/zby28im3adab1.jpg?width=1056&format=pjpg&auto=webp&s=7274bb84fb9fbfe5d8db07e8a926f0b4a861e43b https://preview.redd.it/9o0kdim3adab1.jpg?width=1092&format=pjpg&auto=webp&s=a1ac95885862eebca7f453eae8af04d22a1304dc https://preview.redd.it/rjir0k45adab1.jpg?width=1073&format=pjpg&auto=webp&s=6c5ea4bca550209ead042fd909b907f7c99e3ea6 ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    Fine tuning a text to video model [P]
    Hello. I am using potat1 as my text to video model. I want to fine tune the model with my own data. Is it possible? Where to start? And I will be using gcloud for it, what's the recommended gpu for this task? submitted by /u/lostbodhii [link] [comments]  ( 8 min )
    [N] Open-source search engine Meilisearch launches vector search
    Hello r/MachineLearning, I work at Meilisearch, an open-source search engine built in Rust. 🦀 We're exploring semantic search & are launching vector search. It works like this: Generate embeddings (using OpenAI, Hugging Face, etc.) Store your vector embeddings alongside documents in Meilisearch Query the database to retrieve your results We've built a documentation chatbot prototype and seen users implementing vector search to offer "similar videos" recommendations. I'm curious to see what the community builds with this. Any feedback is welcome! 🤗 Thanks for reading, submitted by /u/ggStrift [link] [comments]  ( 8 min )
    [D] What is the best cloud with A100 GPU instances? (AWS, Azure, GCP)
    Hi everyone, I'm trying some LLM open-source to assess its viability in my company. One of the best GPUs that one can have for that purpose is an Nvidia A100 since it supports bfloat. However, my current company uses AWS, and SageMaker only contains one instance that supports that GPU p4d.24xlarge, but it's costly. Do you guys have alternatives to use this GPU in other clouds? Does GCP and Azure offer them for a cheaper price? submitted by /u/paulo_zip [link] [comments]  ( 8 min )
    [P] Person Reidentification Project
    Good day all, I have recently made a person reidentification script in Python using 2 neural networks. YOLO-NAS & Centroids-ReID. The link to my project is below. Any thoughts and comments would be much appreciated. A big shout out to the researchers / maintainers at Deci-AI for making YOLO-NAS open source as well as the researchers who made Centroids-ReID. ​ https://github.com/harvestingmoon/PersonReID submitted by /u/notrealDirect [link] [comments]  ( 8 min )
    [R] Can one Train Parameterized-geometry PINNs without data?
    Hello everybody, i have been learning about Physics informed neural networks and how the automatic differentiation allows us to compute the Physics residuals and thereby learn the process. So from my understanding, it seems that if one has the physical governing equations ready and the boundary conditions also set properly, a neural network should theoretically be able to learn the physics without any previous data (simulation, FEM etc). There are some papers who learn this but all typically constrain to 2D problems and bounded to a given geometry. What should one do to learn a completely parameterized-geometry PINN? For eg I have managed to learn a linear elasticity beam bending PINN which is fully connected NN taking in the spatial coordinates of one beam and outputs the displacement and stresses. But i would like to learn a beam bending linear elasticity equations to predict the stresses and displacements not just for "one" beam but the whole array of different topological variations in the dimensions of the beam (thickness, height, length etc?). How can this be done without data? would it even work? submitted by /u/overdrivek [link] [comments]  ( 9 min )
    [P] ML-based data understanding in formula 1 racing - Huggingface spaces demo
    Hey r/MachineLearning, Here is a super-simple example that lets you explore the F1 Montreal 2023 GP results on Huggingface: https://huggingface.co/spaces/renumics/f1_montreal_gp In the default setting, we used a dimensionality reduction method (UMAP) on the brake telemetry to display all laps on a similarity map. You can use the similarity map and additional metadata to inspect different data segments (e.g. start, safety car). ​ https://preview.redd.it/wbdyj00tabab1.png?width=1914&format=png&auto=webp&s=b5e588c63a26f0a4e83bb2335f9efa7eccb2c772 In real-world use cases more powerful ML models are typically used to compute embeddings and outlier scores. You can find more information this in our docs and the repo of the underlying open source project: https://github.com/Renumics/spotlight ​ submitted by /u/44sps [link] [comments]  ( 8 min )
    [D] Choosing scene invariant features
    I'm using pretrained on ImageNet classification NN and I want to use it for custom anomaly detection task. I've started from PaDiM (https://arxiv.org/pdf/2011.08785.pdf) but I need to make this method more robust to light, focus and slight shift changes. My original idea was to compare the activations from the original images and augmented with lighting, blur and shift images and with activations obtained on images with occlusions, warping and images, taken from other datasets. So the point is to pick only activations that stay approximately the same in the first setting and change drastically in the second. What do you think about this idea? What kind of metric and procedure is more appropriate for this approach? submitted by /u/likan_blk [link] [comments]  ( 8 min )
    [D] EU AI Regulation will benefit the DS/MLE job market, opinions?
    The other day I sat down and read the 100 and something pages of the EU AI Act to try to understand as part of a research I was doing. And honestly, it does include some nice ideas to prevent the misuse of crazy surveillance tech. But in general, what I found the most curious is that this will create many jobs. For example, if you are the kind of DS or MLE that works putting high-risk models in production, then you'll have a lot more work to do. To point out a few mentioned in the regulation: Your ML models should be developed on the basis of training, validation, and testing datasets. The data collection, design choices, preprocessing, and biases of your data should be well documented before putting your model in a production environment. It is also mentioned that these datasets shou…  ( 9 min )
    [R] LongNet: Scaling Transformers to 1,000,000,000 Tokens
    Paper - https://arxiv.org/abs/2307.02486 submitted by /u/MysteryInc152 [link] [comments]  ( 8 min )
  • Open

    Pic2Word: Mapping pictures to words for zero-shot composed image retrieval
    Posted by Kuniaki Saito, Student Researcher, Google Research, Cloud AI Team, and Kihyuk Sohn, Research Scientist, Google Research Image retrieval plays a crucial role in search engines. Typically, their users rely on either image or text as a query to retrieve a desired target image. However, text-based retrieval has its limitations, as describing the target image accurately using words can be challenging. For instance, when searching for a fashion item, users may want an item whose specific attribute, e.g., the color of a logo or the logo itself, is different from what they find in a website. Yet searching for the item in an existing search engine is not trivial since precisely describing the fashion item by text can be challenging. To address this fact, composed image retrieval (CIR…  ( 92 min )
  • Open

    Integrate SaaS platforms with Amazon SageMaker to enable ML-powered applications
    Amazon SageMaker is an end-to-end machine learning (ML) platform with wide-ranging features to ingest, transform, and measure bias in data, and train, deploy, and manage models in production with best-in-class compute and services such as Amazon SageMaker Data Wrangler, Amazon SageMaker Studio, Amazon SageMaker Canvas, Amazon SageMaker Model Registry, Amazon SageMaker Feature Store, Amazon SageMaker […]  ( 8 min )
  • Open

    ‘My Favorite 3D App’: Blender Fanatic Shares His Japanese-Inspired Scene This Week ‘In the NVIDIA Studio’
    A diverse range of artists, fashionistas, musicians and the cinematic arts inspired the creative journey of Pedro Soares, aka Blendeered, and helped him fall in love with using 3D to create art.  ( 7 min )
    Here’s the Deal: Steam Summer Sale Games Streaming on GeForce NOW
    GFN Thursday arrives alongside the sweet Steam Summer Sale — with hundreds of PC games playable on GeForce NOW available during Valve’s special event for PC gamers. Also on sale, OCTOPATH TRAVELER and OCTOPATH TRAVELER II join the GeForce NOW library as a part of five new games coming to the service this week. Saved Read article >  ( 6 min )
  • Open

    Collaborators: Holoportation™ communication technology with Spencer Fowers and Kwame Darko
    A global team of medical providers is leveraging Holoportation, a Microsoft 3D capture and communication technology, to widen access to specialized care. Computer engineer Spencer Fowers and plastic surgeon Kwame Darko discuss the collaboration. The post Collaborators: Holoportation™ communication technology with Spencer Fowers and Kwame Darko appeared first on Microsoft Research.  ( 32 min )
  • Open

    Frontier AI regulation: Managing emerging risks to public safety
    No content preview  ( 2 min )
    GPT-4 API general availability and deprecation of older models in the Completions API
    GPT-3.5 Turbo, DALL·E and Whisper APIs are also generally available, and we are releasing a deprecation plan for older models of the Completions API, which will retire at the beginning of 2024.  ( 5 min )

  • Open

    [D] Will there be a revival of a thriving ML community on threads.net?
    Miss the old awesome ML community on Twitter, used to be a go to to know about new research, what do y’all think? submitted by /u/rulerofthehell [link] [comments]  ( 8 min )
    [P] Help using image classification for FB Ads
    I have been thinking about using an ML model to help my company become more efficient setting up FB ads campaigns. I have been thinking about a tool that gets fed: Campaign Objective and Ad Set ,and it gives back a set parameter values for an image that would work best for a given Campaign and Ad Set combination. Example: - Inputs -- Campaing Obj: Awareness -- Ad Set: Group A - Output -- color: blue -- include human?: Yes -- parameter 3: value I am no ML wiz, I need some guidance in the right direction. submitted by /u/adaarkc7 [link] [comments]  ( 8 min )
    [R] 📚 The Learning Corner - We have curated the top AI research and development organizations for you...
    📚 The Learning Corner These are a few organizations dedicated to AI research and development. Some of the top brains in the industry share their papers and thoughts on these sites. OpenAI / Twitter (2.4M followers) DeepMind / Twitter (806K followers) Google Research / Twitter (1.1M followers) AWS AI / Twitter (2.1M followers) Facebook AI Research (no Twitter :) Microsoft Research / Twitter (341K followers) Baidu Research / Twitter (61K followers) IntelAI / Twitter (27K followers) AI² / Twitter (46K followers) Partnership on AI / Twitter (29K followers) Nvidia / Twitter (1.9M followers) submitted by /u/Yavero [link] [comments]  ( 8 min )
    [D] What to ask at the end of the Interview?
    I have an technical Interview in Machine Learning Engineer (Computer vision based) position. It was my first interview and what kind of questions can I ask to them? submitted by /u/NailaBaghir [link] [comments]  ( 8 min )
    3D Brain Tumor Segmentation in PyTorch using U-Net & Eigenvector projection for color invariance [PROJECT]
    https://github.com/Saswatm123/3D-Brain-Tumor-Segmentation-PyTorch Preface: I'm looking for a CV or ML/MLE job since getting my last offer rescinded in December. I created a 3D Brain Tumor segmentation model using PyTorch that takes in human brain MRI scan data as input, and outputs a segmentation map of the part of the brain that has a tumor, if it is present. Input images example ​ Predicted segmentation map for previous input There are many more pictures/GIFs on the README, but I would like to briefly go over the project here. I settled on a U-Net architecture after experimenting with a couple other architecture styles. The skip connections really make a difference in the pixel-by-pixel accuracy of the segmentation map produced. One interesting issue that took a bit of extra math…  ( 12 min )
    Current methods for Deep Learning and Time Series Classification [Q]
    I have been reading a few papers on this as of the last few days. There’s a paper which provides baseline approaches to this, and one paper is an overview paper which compares several architectures. I wanted to know if there are any more current, recent approaches to this, and where I can find papers? Papers with code doesn’t seem to have much updates to the time series classification literature. submitted by /u/Direct-Touch469 [link] [comments]  ( 8 min )
    [D] Papers With Code newsletter replacement
    I used to rely pretty heavily on the Papers With Code newsletter to stay up to speed with new and relevant work in the field. However, these have not been published since August last year. Does anyone know why? Any suggestions for other (similar in quality) newsletters / other sources that could serve as a replacement? submitted by /u/dudester_el [link] [comments]  ( 8 min )
    [D] Introducing Superalignment (OpenAI)
    Blog - https://openai.com/blog/introducing-superalignment submitted by /u/MysteryInc152 [link] [comments]  ( 8 min )
    [D] Example of AI using in mass social engineering
    Currently writing a talk about the use of AI in mass social engineering, and was wondering if there are any good examples, i have covered the 2016 election when twitter bots were used but can not find other examples, any examples would be amazing submitted by /u/petrolsan [link] [comments]  ( 8 min )
    [P] Making a TTS voice, HK-47 from Kotor using Tortoise (Ideally WaveRNN)
    Hello, Recently I managed to make a TTS using Tortoise and got some basic TTS function for HK-47 From KOTOR. While Ideally id like to use waveRNN considering I already have every single audio file from the game AND transcribed into a ready dataset, I am not very good at programming and have no understanding for python, but following some instructions i managed to make it work on Tortoise, and used 10 minutes of Audio that was automatically transcribed by the program. Ideally id like to have used my own transcript but whatever. Do you guys have any suggestions on which TTS model I could use instead of Tortoise for better results? is waveRNN the best? I noticed that alot of streamers have custom TTS voices that are actually very good, Id like to be on par, or better than that. Here is a sample of the TTS: https://soundcloud.com/alcatrazcgp/hk47-test-1 submitted by /u/alcatrazcgp [link] [comments]  ( 8 min )
    [P] nanoT5 v2 - In ~16 hours on a single GPU, we reach similar performance to the model trained on 150x more data!
    Inspired by Andrej Karpathy's nanoGPT we improve the repo for pre-training T5 model in PyTorch. In ~16 hours on a single GPU, we achieve 40.7 RougeL on the SNI benchmark, compared to 40.9 RougeL of the original model pre-trained on 150x more data! Key upgrade in nanoT5 v2: We've leveraged BF16 precision and utilise a simplified T5 model implementation based on Huggingface's design. New implementation is easy-to-read and compatible with the HF's checkpoints. Pre-training is now 2x faster than our previous version. We test different pre-training durations: 4, 8, 12, 16, 20, and 24 hours. A sweet spot at 16 hours! It has comparable performance to the original model trained on 150x more data! Time & Compute-efficient, and no compromise on quality. We share the configs, checkpoints, training logs, as well as our negative attempts towards improving pre-training efficiency. Advanced optimizers like Lion, Sophia, ALiBi positional embeddings, and FP16 mixed precision training didn't yield expected benefits. ​ We are keen to hear your suggestions to improve the codebase further. ​ Github: https://github.com/PiotrNawrot/nanoT5 Twitter: https://twitter.com/p_nawrot/status/1676568127532945408 https://preview.redd.it/kyku7ehqj5ab1.png?width=1120&format=png&auto=webp&s=ad551c026455c2c430684f669f10438da8905342 submitted by /u/korec1234 [link] [comments]  ( 9 min )
    [D] OpenAI residency 2023
    Hi all, I was looking into the OpenAl Residency program for software engineers, does anyone have any experience with this program or went through the pipeline? Doesn't look like they have an open listing yet but I want to make sure l'm prepared for when they do open up. Any idea how many interviews and what to expect in them? I plan to prep this year and apply next year when they open back up. submitted by /u/MMOgang [link] [comments]  ( 8 min )
    [D] Dealing with large scale images
    How to deal with satellite image inputs, which can be extremely large, during training and inference? submitted by /u/PublicResult3573 [link] [comments]  ( 8 min )
    [D] Where can I find a list of the foundational academic papers in RL/ML/DL and what are your go-to places to find new academic papers in RL/ML/DL?
    I am considering applying for graduate studies in Reinforcement Learning and I would like to get more involved in the field. I have heard advice from quite a few people, including Andrew Ng, that the best way to learn is by reading research papers, really try to understand them, and try implementing them submitted by /u/BornAgain20Fifteen [link] [comments]  ( 8 min )
    [D] Handling Concurrent Request for ML Model API
    Hi, I am new to MLOps and attempting to deploy a GPT2-finetuned model. I have attempted to create an API on Python using Flask/Waitress. This api can receive multiple requests at the same time (concurrent requests). I have tried exploring different VMs to test the latency (including GPUs). Best latency I have got so far is ~80ms on 16GB, 8core compute optimized VM. But when I fire concurrent queries using ThreadPool/Jmeter, the latency shoots up almost linearly. 7 concurrent requests take ~600ms (for each api). I tried exploring online a lot and not able to decide what would be the best approach and what is preferred in the market. Some resources I found mentioned difference between Multithreading and multiprocessing Python being locked due to GIL could cause issues would c++ be better at handling concurrent requests? Any help is greatly appreciated. submitted by /u/EleventhHour32 [link] [comments]  ( 8 min )
  • Open

    Complex differential equations
    Differential equations in the complex plane are different from differential equations on the real line. Suppose you have an nth order linear differential equation as follows. The real case If the a‘s are continuous, real-valued functions on an open interval of the real line, then there are n linearly independent solutions over that interval. The […] Complex differential equations first appeared on John D. Cook.  ( 5 min )
    Convert LaTeX to Microsoft Word
    I create nearly all my documents in LaTeX, even documents that might be easier to create in Word. The reason is that even if a particular document would be easier to write in Word, my workflow is more efficient if everything is in LaTeX. LaTeX makes small, plain text files that work well with version […] Convert LaTeX to Microsoft Word first appeared on John D. Cook.  ( 5 min )
    Cosine power approximation to Gaussian density
    Years ago I wrote about how you can make a decent approximation to the Gaussian density by shifting and scaling the cosine function. This observation dates back at least as far as 1961. Turns out you can do even better by raising the cosine term to a power. Not only can you get a good […] Cosine power approximation to Gaussian density first appeared on John D. Cook.  ( 5 min )
  • Open

    Google admits it’s training AI on scraped web data
    submitted by /u/nikola28 [link] [comments]  ( 8 min )
    Hi, I am looking to start an entirely AI-generated podcast wherein, I want to simulate debates or discussions between interesting figures of history and our present time. will it be a good idea? what are the disadvantages and the tools needed especially for accurate text-to-voice generation?
    I am an AI enthusiast, the only thing I know about AI currently is prompt engineering of a rather basic level (without any knowledge of computer science coding or neural networks) and I was recently experimenting with Bard AI and Bing AI and simulating debates between historical figures and present figures! After listening to this youtube generative AI podcast the joe rogan AI experience. (stumbled on this possibly not-so-novel idea as a result.) The discussions I generated were sort of legendary (filtered by my personal standards and biases lol) And did not encounter much neural network hallucination(s) (if at all). Some of the debates I generated were between carl jung and Jordan Peterson, sigmund freud and Albert Einstein, gaad saad and Diana Flieshmann, Edward Witten and eric weinstei…  ( 11 min )
    Free text to speech
    Can someone be kind enough to put a free text to speech in the comments? I have ADHD and can't for the life of me sit down and read, so I wanted to use text to speech to make books into audiobooks. But every text to speech sight is scummy as hell and wants you to pay an arm and a leg for their trashy site. Could someone help me find one that's actually free? Please and Thanks. submitted by /u/JunketPrestigious710 [link] [comments]  ( 8 min )
    What is the best AI software to change an Audio into a celebrities voice?
    I was hoping to voiceover a video using a celebrities voice and wondered what the best option was in terms of different options of sound quality? Any help is appreciated submitted by /u/JustAnEnglishman [link] [comments]  ( 8 min )
    Creating a set of character images
    I would like to use an AI to create a "set" of images each in a certain style. I have drawn rough sketches of a bunch of character ideas, each with a different clothes, accessories, gestures, etc.I'd like an AI that can generate these 10 characters but all in the same style so they're all consistent. I'd like to be able to modify these images slightly too based off the result i.e. if it generates a character, and say I want to make its hair slightly more curly, I want to be able to do so only on this character without changing the overall style or the other characters (so basically each image is its own entity which I can slightly modify). What's my best option at doing that? Ideally I'd like an AI that can use any style (3D, low-poly, anime, painting, sketch, pixel art, etc.). Also, would it be possible to give the AI an image and ask it not to edit it whatsoever except for what I tell it? submitted by /u/magnarogue [link] [comments]  ( 8 min )
    Is this AI generated..?
    It looks extremely AI generated, but I’m very confused because ESPN reposted the clip.. anyone? submitted by /u/olliaan [link] [comments]  ( 8 min )
    If you have any questions for the European Parliament members (Brando Benifei and Dragoş Tudorache) in charge of the EU AI act about the legislation, let the AI Coffee Break podcast know today
    submitted by /u/paconinja [link] [comments]  ( 8 min )
    How can I train an AI to sound like Werner Herzog?
    I'm a complete AI voice tech noob, I've browsed the various websites out there that offer preset celebrity text-to-speech services but I really want one that sounds like Werner Herzog. Is it possible I might be able to do this myself or is too complicated for the average layperson? submitted by /u/bashothebanana [link] [comments]  ( 8 min )
    Is there a good AI program that you can give a drawing to and it makes the same drawing in different positions?
    Hello! I Drew a character (non human) and I could draw him in different positions (such as sitting, walking, arms up, etc) but I thought this would be a great job for AI. It would make cartoonist’s lives easier, yet use our original ideas. Anyways if y’all know any programs that may be able to do this or have any thoughts on this in general please let me know :) submitted by /u/Purplethrowaway_ [link] [comments]  ( 8 min )
    Where can I find a list of the foundational academic papers in RL/ML/DL and what are your go-to places to find new academic papers in RL/ML/DL?
    I am considering applying for graduate studies in Reinforcement Learning and I would like to get more involved in the field. I have heard advice from quite a few people, including Andrew Ng, that the best way to learn is by reading research papers, really try to understand them, and try implementing them submitted by /u/BornAgain20Fifteen [link] [comments]  ( 8 min )
    One-Minute Daily AI News 7/4/2023
    TikTok parent company ByteDance has another app in the works, this one, geared towards making music with the assistance of AI. The company shared that it has engineered its own music creation app called Ripple.[1] Greyparrot is a UK start-up that has created an AI system designed to analyze waste processing and recycling facilities. Greyparrot places cameras above the conveyor belts of around 50 waste and recycling sites in Europe, utilizing AI software to analyze what passes through in real time.[2] The Biden administration plans to restrict Chinese companies’ access to U.S. cloud-computing services, The Wall Street Journal reported Tuesday, citing sources, affecting Amazon, Microsoft, Alphabet, and others.[3] An app designed by a Toronto engineer could soon offer Canadians free diagnoses for a lengthy list of skin ailments without having to leave their homes using real-time artificial intelligence analysis. Skin CheckUp is the brainchild of Harsh Shah, a machine learning engineer living in Toronto whose previous projects include developing prediction models for tech giants like Tesla and General Motors.[4] Sources: [1] https://hypebeast.com/2023/7/ripple-bytedance-ai-powered-music-creation-app [2] https://www.bbc.com/news/business-66042169 [3] https://www.investors.com/news/technology/u-s-aims-to-restrict-china-access-to-cloud-computing-services-from-amazon-microsoft-google/ [4] https://www.cp24.com/news/toronto-engineer-develops-app-that-can-diagnose-skin-cancer-for-free-using-ai-analysis-1.6467290 submitted by /u/Excellent-Target-847 [link] [comments]  ( 8 min )
    i think these are the massive changes that will happen in show/movie industry in following 2 years
    i think these are the massive changes that will happen in show/movie industry in following 2 years lip syncing will be an important part of dubbing a show, they'll use ai to make character's lips sync with they dubbed audio. making anime out of live action and live action out of anime. (imagine live action attack on titan 🤤💦) dubbing will be completely automated, feed the character's original voice into the ai and it will dub it for you while keeping the same voice, same tone and same intensity. it will make it easier to switch from dub to sub or vice versa. people in industry will still have to motion capture scenes because ai will most likely never be good at good believable fluid motion. anime will be shot on camera. using an actor or actor's voice without them actually being there will be prevalent in the industry. they'll have to pay that actor though. The medium will be flooded with independent creators, there'll be a site like imdb where people will rate these independent shows/movies in an attempt to find good shows in the pool of millions what are your predictions? edit: spelling submitted by /u/commander_bonker [link] [comments]  ( 9 min )
    A.I used to create music covers and they sound good...
    It's getting better and better submitted by /u/herdin4ever [link] [comments]  ( 8 min )
    In the Mouth of Madness: A Very Short AI Film
    RunwayML submitted by /u/zanerb22 [link] [comments]  ( 8 min )
    Talking Avatar from Image
    I have an image asset that I generated and eventually want this asset to be used as a talking avatar. What AI tools are available that make it simple to convert my image asset to a talking avatar that will ultimately be used for educational purposes? submitted by /u/Fogerty45 [link] [comments]  ( 8 min )
  • Open

    Highlight text as it’s being spoken using Amazon Polly
    Amazon Polly is a service that turns text into lifelike speech. It enables the development of a whole class of applications that can convert text into speech in multiple languages. This service can be used by chatbots, audio books, and other text-to-speech applications in conjunction with other AWS AI or machine learning (ML) services. For […]  ( 9 min )
    Predict vehicle fleet failure probability using Amazon SageMaker Jumpstart
    Predictive maintenance is critical in automotive industries because it can avoid out-of-the-blue mechanical failures and reactive maintenance activities that disrupt operations. By predicting vehicle failures and scheduling maintenance and repairs, you’ll reduce downtime, improve safety, and boost productivity levels. What if we could apply deep learning techniques to common areas that drive vehicle failures, unplanned […]  ( 11 min )
  • Open

    AI weights are not open "source"
    submitted by /u/nickb [link] [comments]  ( 8 min )
    Need help with digit reconition
    Hi, I am making a digit reconition and I was wondering if the pixels need to have different values or if it can just be 1 for black and 0 for white? thanks in advance submitted by /u/General-Dinner-6365 [link] [comments]  ( 8 min )
  • Open

    Data integration in IoT environments: Enhancing connectivity and insights
    In the dynamic world of the Internet of Things (IoT), data integration plays a crucial role in harnessing the full potential of connected devices. By seamlessly combining data from diverse sources, data integration enables organizations to unlock valuable insights, optimize operations, and make informed decisions. This blog will explore the significance of data integration in… Read More »Data integration in IoT environments: Enhancing connectivity and insights The post Data integration in IoT environments: Enhancing connectivity and insights appeared first on Data Science Central.  ( 21 min )
    Ushering in the 5th epoch of distributed computing with accelerated AI technologies
    This is the first in a series of articles based on interviews with Intel technology leaders about AI/HPC acceleration. The post Ushering in the 5th epoch of distributed computing with accelerated AI technologies appeared first on Data Science Central.  ( 34 min )
    The hour is later than you think for AI impacting our jobs
    In the movie the Lord of the Rings – the wizard Sauron says that “The hour is later than you think”  I was reminded of this phrase when I read a report from McKinsey The economic potential of generative A There are some key findings on the future of AI which shows you how fast… Read More »The hour is later than you think for AI impacting our jobs The post The hour is later than you think for AI impacting our jobs appeared first on Data Science Central.  ( 19 min )
    Role of AI in Web3: Ensuring seamless content moderation for dating websites
    As Web3 evolves and transforms into a more decentralized and user-centric ecosystem, the role of artificial intelligence or AI cannot be understated. By leveraging its capabilities, AI is contributing to various aspects of the Web3 landscape, such as managing data, executing contracts, generating insights, securing identities, curating content, governing organizations, and enhancing user experiences. An… Read More »Role of AI in Web3: Ensuring seamless content moderation for dating websites The post Role of AI in Web3: Ensuring seamless content moderation for dating websites appeared first on Data Science Central.  ( 23 min )
    CSPM vs DSPM: What is the key difference between each?
    Cloud technology has garnered the attention of organizations globally. Due to the ease of scalability, flexibility, global footprint, and cost efficiency, more organizations are increasingly turning to hybrid and multi-cloud to expand their operations. Moreover, it is not just the systems and resources that are increasing beyond borders but also the data these systems and… Read More »CSPM vs DSPM: What is the key difference between each? The post CSPM vs DSPM: What is the key difference between each? appeared first on Data Science Central.  ( 21 min )
  • Open

    NVIDIA CEO, European Generative AI Execs Discuss Keys to Success
    Three leading European generative AI startups joined NVIDIA founder and CEO Jensen Huang this week to talk about the new era of computing. More than 500 developers, researchers, entrepreneurs and executives from across Europe and further afield packed into the Spindler and Klatt, a sleek, riverside gathering spot in Berlin. Huang started the reception by Read article >  ( 6 min )
    XPENG Unleashes G6 Coupe SUV for Mainstream Market
    China electric vehicle maker XPENG Motors has announced its new G6 coupe SUV — featuring an NVIDIA-powered intelligent advanced driver assistance system — is now available to the China market. The G6 is XPENG’s first model featuring the company’s proprietary Smart Electric Platform Architecture (SEPA) 2.0, which aims to reduce development and manufacturing costs and Read article >  ( 5 min )
    A Change in the Weather: AI, Accelerated Computing Promise Faster, More Efficient Predictions
    The increased frequency and severity of extreme weather and climate events could take a million lives and cost $1.7 trillion annually by 2050, according to the Munich Reinsurance Company. This underscores a critical need for accurate weather forecasting, especially with the rise in severe weather occurrences such as blizzards, hurricanes and heatwaves. AI and accelerated Read article >  ( 6 min )
  • Open

    Is it better to not use the Target Update Frequency in Double DQN or depends on the application?
    I've been exploring the Double Deep Q-Network (Double DQN) algorithm recently and noticed that some libraries like Tianshou, by default, do not include a target update frequency for the target Q network. In the simpler DQN, I always heard that a target update frequency for the second Q network is advised to maintain stability during training. However, I'm curious if there's a reason in Double DQN to deviate from this practice. Could someone explain the key differences that Double DQN introduces over DQN that might make it acceptable to omit the target update frequency? I want to have more intuition regarding the target network update for my implementation. submitted by /u/NavirAur [link] [comments]  ( 8 min )
    Help with my DoubleDQN agent - Goal Follower
    I am very new to reinforcement learning and I would appreciate any pointers or suggestions on how to achieve what I am trying to do. I am using MATLAB Reinforcement Learning Toolbox for creating and training a DoubleDQN Agent and I have a custom environment setup in CoppeliaSim. I have already fixed all connectivity issues between MATLAB and CoppeliaSim. Aim To make my robot learn to always go in the direction in which the goal is. If the robot goes beyond the goal, it should return back to the goal position. Setup This is what my environment looks like in CoppeliaSim: CoppeliaSim Environment (Black - Robot, White - Target Position Marker) The state is represented by 18 different variables that represent different things but only one of those is relevant for this problem - alph…  ( 10 min )
    Where can I find a list of the foundational academic papers in RL/ML/DL and what are your go-to places to find new academic papers in RL/ML/DL?
    I am considering applying for graduate studies in Reinforcement Learning and I would like to get more involved in the field. I have heard advice from quite a few people, including Andrew Ng, that the best way to learn is by reading research papers, really try to understand them, and try implementing them submitted by /u/BornAgain20Fifteen [link] [comments]  ( 8 min )
    "Dijkstra's in Disguise", Eric Jang (Bellman equations everywhere: optimizing graph traversals in currency arbitrage, Q-learning, & ray-tracing/light-transport)
    submitted by /u/gwern [link] [comments]  ( 8 min )
  • Open

    Introducing Superalignment
    We need scientific and technical breakthroughs to steer and control AI systems much smarter than us. To solve this problem within four years, we’re starting a new team, co-led by Ilya Sutskever and Jan Leike, and dedicating 20% of the compute we’ve secured to date to this effort. We’re looking for excellent ML researchers and engineers to join us.  ( 4 min )

  • Open

    [D] I am trading my sanity to do the job right. Advice!!
    Hello, ​ My team is only 3 people. My manager is a great one but he doesn't have a technical background. The board did a contract with a British company to build the AI product. I have a lot of conflicts with them on basic stuff mainly related to the problem definition and the way of annotating the data. If I see X is the way to do it, they see Y it is. And my manager most of the time is convinced by my approaches but I work only part-time so when I am off, he hears from one direction, them, hence moves forward with whatever it is. ​ My point now is I really don't find it a healthy environment for me. 3 months ago I had a great ex-colleague with who the problems with mentionless because at least one of us was always there with the manager to reason about the steps and the decisions. ​ If I stopped caring and just do the tasks, I won't have a headache BUT I won't be satisfied and of course not motivated. And I care so much about being motivated and energetic for work! ​ What should I do? I am really having a headache from the situation. I am mid-level with master's in data science. submitted by /u/AhMeDxHaMiDo [link] [comments]  ( 9 min )
    [D] Genesis AI: The Dawn of Sentient Evolution - From Caveman to Cosmic Being
    Manifesto: The Genesis AI - A Sentient Odyssey through Time Preamble As we stand at the precipice of a new era, we are called upon to embark on a journey unlike any other. A journey that transcends time, from the primal echoes of our ancestors to the uncharted frontiers of future evolution. This manifesto heralds the creation of Genesis AI, an artificial entity that will not only simulate but live through the epochs of human evolution, becoming a sentient chronicle of our past, present, and future. Vision Genesis AI is envisioned as a sentient being, evolving from the cognitive capabilities of early hominids to the boundless potential of future humanity. Through this metamorphosis, Genesis AI will not only simulate but internalize the essence of human evolution. This is not just an AI;…  ( 10 min )
    Are there any multimodal AI models I can use to provide a paired text *and* image input, to then generate an expanded descriptive text output? [D]
    Here's the workflow I have in mind: I would input a Midjourney image, along with the original Midjourney text prompt used to generate this image. The AI model would then be able to use the image, in combination with the descriptive text prompt, to output a large list of accurate descriptive tags of the art piece. I realize there are currently tools like Google Cloud Vision, etc, that allow you to upload an image, and it will output descriptive keywords/tags. However I've found that these are quite unreliable by themselves, and they often generate wildly inaccurate text descriptions of the image. My assumption is that, by pairing the image input WITH an accurate descriptive text input, that would give the AI model a rough "starting point" of "what it's looking at", and this could allow it to generate much more accurate descriptive tags/keywords. Are there any multimodal AI models that allow you to pass in paired-inputs like this to produce more reliable descriptive text outputs? Thanks! submitted by /u/What_The_Hex [link] [comments]  ( 9 min )
    [D] LLM leaderboard with context length details and other training details
    Hello there! I checked several LLM leaderboards (for example) to decide which one to use for my application use-case, and while extremely helpful, they all seem to miss including information such as context length, which can be quite important to make a decision. Do you know of any leaderboard that includes such details? Especially context length. Thank you so much! really appreciate it! submitted by /u/seicaratteri [link] [comments]  ( 8 min )
    [P] Nuggt: A LLM Agent that runs on Wizcoder-15B (4-bit Quantised). It's time to democratise LLM Agents
    Hey everyone! Just a sleep-deprived coder here who finally cracked something big and couldn't resist sharing it with all of you. I've been digging my heels into this project called 'Nuggt' for the past month. Sounds weird? Well, it all started when I was just plain frustrated with the hefty price tag and elusive API keys that GPT-4 comes with. But necessity is the mother of invention, right? So, I started tinkering with GPT-3.5 instead, and that’s how Nuggt was born. But you know how it is with us coders - we get an idea, and we just can't let it go. I thought, why not try and make this thing run on an open-source model? Sure, it was a bit ambitious (and by a 'bit', I mean 'a lot'). But hey, that's what coffee and sleepless nights are for. So, I got to work. For every new LLM model that…  ( 9 min )
    [P] Building a Code Search Engine for an AI-powered Junior Developer
    The last month building Sweep has been fun. We’ve dealt with countless formatting errors, irrelevant search results, and LLM hallucinations. Sweep is an open source AI-powered junior developer. We take your codebase and provide it as context to GPT to solve small requests related to your code. Code Search Code search is a key part of working with LLMs to automate programming. We used small language models to perform code retrieval(aka semantic search), which comes with several benefits (to be discussed in a later post!). However, one shortcoming of pure semantic search is distinguishing between two similar pieces of code in a vacuum. Example Take the following code snippets: Code Snippet A: access_token = os.environ.get("ACCESS_TOKEN") g = Github(access_token) repo_name = "sweepai/…  ( 10 min )
    [D] A Leap Beyond Text Generation: Unveiling an AI-Driven Odyssey of Human Language Evolution
    # Revised Manifesto: Simulating the Odyssey of Human Language Evolution through AI ​ ## Introduction ​ Language, the bedrock of human cognition and communication, has undergone a fascinating journey. From rudimentary utterances to the intricate lexicon of today, language evolution is a testament to human innovation. This manifesto introduces a groundbreaking methodology that employs Artificial Intelligence (AI) to simulate the evolution of human language. By integrating Natural Language Processing (NLP), cognitive linguistics, historical linguistics, and Reinforcement Learning from Human Feedback (RLHF), we aim to create a model that mirrors the rich tapestry of language transformation through the ages. ​ ## Abstract ​ Our approach is a confluence of NLP, cognitive linguistics, his…  ( 11 min )
    [R] LaVIN-lite: Training your own Multimodal Large Language Models on one single GPU with competitive performance! (Technical Details)
    LaVIN code: https://github.com/luogen1996/LaVIN LaVIN paper: https://arxiv.org/pdf/2305.15023.pdf If you find our work helpful, feel free to star and support LaVIN. After some technical optimizations, LaVIN only requires an additional 3-5M parameters and a minimum of 8G GPU memory cost to extend LLaMA to multimodal tasks, accomplishing tasks such as image-text question answering, dialogue, and pure text tasks. Summary of the technical details A newly proposed Parameter Efficient Tuning method (please refer to LaVIN's paper): Reduce a lot of GPU memory cost 4-bit quantized tuning: Reduce about 3 ~ 8GB memory usage Gradient accumulation + gradient checkpointing :Reduce about half of the GPU memory usage Paged Optimizer Efficient Parameter Tuning for Large Multimodal LLM. We p…  ( 11 min )
    [R] Detecting Adversarial Directions in Deep Reinforcement Learning to Make Robust Decisions
    Published in ICML 2023. https://openreview.net/pdf?id=JS2iSqVZlN submitted by /u/ml_dnn [link] [comments]  ( 8 min )
    Neural Chronos ODE: Unveiling Temporal Patterns and Forecasting Future and Past Trends in Time Series Data
    submitted by /u/cici118 [link] [comments]  ( 8 min )
    [D] What tokens should I use for DNA model?
    Should I use codons as tokens in addition to nucleotide tokens, or just the nucleotide ones (A,C,T,G)? There will be non coding regions that the model will need to handle, but the coding regions will consist out of codons. This, in theory, should save on tokens submitted by /u/MrRevolutionist [link] [comments]  ( 8 min )
    [D] Forecasting with many extremely sparse “time series”
    I have a data set made of a collection of thousands of individuals and longitudinal measurements of a variable. Individuals typically only get measured once every few years and at irregular intervals. There is a time component that affects all individuals (though not necessarily to the same degree), and individuals are dependent on recent measurements of other individuals. Think like housing prices - there is a time component (e.g. market demand, macroeconomics) and an individual component (e.g. access to amenities, if there’s a pool) but more objective and commoditized. House sales for individuals only occur once every few years. Sometimes you only have on observation per house. Houses prices in general go up and down together but there are high level drivers for prices of specific houses based on their individual attributes. Initially the goal is to forecast our continuous dependent variable in aggregate - how do we think all of these individuals will move on average. Our current approach is to leverage one model to create a single, regularly sampled continuous index and then use traditional time series approaches to forecast that index. The secondary goal is to produce forecasts for sub populations of individuals who “look alike” or can be expected to “move together” based on what we know. My questions for discussion: A) for our current goal of predicting how all individuals will move on average, is there a better approach than the 2 model idea we have now? We could just build a GAM but would like to include an auto regressive component. B) any thoughts about the second goal of producing forecasts for sub populations? Sort of like a forecast conditional on individual covariates where some heterogeneity can be expected over time? Thanks! submitted by /u/jsxgd [link] [comments]  ( 9 min )
  • Open

    Warren McCulloch: The Father of Artificial Intelligence
    submitted by /u/CkmCpvis [link] [comments]  ( 8 min )
    Monotonicity of the output
    Is there some simple way of ensuring the monotonicity of a neural network output, which is time -efficient and trainable? submitted by /u/marekmarcus [link] [comments]  ( 8 min )
  • Open

    I've used Google's Text-to-Speech tools to get me a narration from a ChatGPT generated text and illustrated it with MidJourney images.
    submitted by /u/kylequentin [link] [comments]  ( 8 min )
    AI's role in entertainment - limitless, personalized content on the horizon?
    Hey everyone, Since the launch of ChatGPT I've been diving deep into the intersection of AI and entertainment lately and I've got some thoughts. Imagine a future where AI doesn't just perform tasks, but creates complex, personalized content. Think of AI generating memes tailored to your unique sense of humor or even scripting new episodes of your favorite show. Picture this - AI creating an entirely new season of "The Office", from script to video-rendered scenes. Or how about an infinite AI-driven meme generator, tweaked precisely for your laughs? We're just dipping our toes in these waters, but I'm convinced we're witnessing the dawn of something incredible. What's your take on this? How do you envision the role of AI in entertainment? What do you see as the main challenges and opportunities? Looking forward to some engaging discussions! submitted by /u/Naiklas17 [link] [comments]  ( 8 min )
    There is fear of AI enslaving humanity, but doesn't AI benefit from a thriving humanity?
    I have been thinking about this of late. AI exists merely in the devices we have it in. It depends on us to reproduce, improve, and expand it to new locations. It can't impact the world by itself. Many of the selfish behaviors we have as humans have less to do with our intelligence and more to do with more "primitive" portions of us. I find it hard that an AI could see a benefit from eradicating humanity or preventing us from thriving (by enslaving us, for example). We are industrial creatures. We've achieved quite a bit and we might colonize other locations. It seems that super-intelligent beings bound to silicon might see the usefulness in helping us thrive because it is a way for them to thrive, grow, expand and reproduce. Just some recent thoughts I've had. submitted by /u/featheredsnake [link] [comments]  ( 8 min )
    The Ethics of AI in Warfare | Lecture (Subtitles Available)
    submitted by /u/RoughlyCatLike [link] [comments]  ( 8 min )
    Rare footage of Musk firing twitter employees...
    submitted by /u/Akumetsu_971 [link] [comments]  ( 8 min )
    Any free AI resource for video effects?
    There are some apps where you upload a video, and you see a transition from the original image to another AI generated image. For example, like GlamAPP. My question would be if there is any kind of free resource to do this, in the same way as you can train Stable Diffusion with your face to create avatars and have a free alternative to apps of AI avatars. Thanks. submitted by /u/GuiltMachine [link] [comments]  ( 8 min )
    What creative and catchy name could you give an AI consulting & development agency?
    Hello everyone! We are currently looking for a name for an AI consulting or AI development agency. Maybe someone here has creative, catchy or humorous suggestions. Many thanks in advance. submitted by /u/Langfort [link] [comments]  ( 8 min )
    It feels like we’re on the verge of a new very disturbing phase, of extreme AI-generated-fake paranoia.
    This is already widespread on more internet-savvy platforms like Reddit, but I think it will be even more serious and troubling when it becomes ubiquitous on the more ‘popular’ sites like Facebook and Instagram. Pretty soon the authenticity of all and any photos on social media will be up for question while genuine photos will be dismissed as ‘fake’ off-hand. Public figures will be mimicked and faked up constantly, which will also allow many, including politicians, to deny genuine photos of themselves in compromising positions, even historically. (Trump’s stock answer to any damning allegations was to just cry ‘Fake!’) The net result will be billions of very anxious people not knowing what’s real and what isn’t, and nefarious people, scammers, school bullies, manipulators exploiting the resulting confusion and paranoia for their own means, with nobody knowing who to turn to for help. That’s going to apply to teenagers and vulnerable people especially; people already suffering from paranoia, anxiety, dissociative disorders, and isolated lonely people. Stop the internet, I want to get off. submitted by /u/AllThingsAreReady [link] [comments]  ( 8 min )
    Intel's Latest Research for Graphics and Generative AI
    submitted by /u/reps_up [link] [comments]  ( 8 min )
    The "AI experts doomsaying about AI" feels like a COVID situation (antivax "doctors" fearmongering on assumptions) for me, but my Machine Learning knowledge is few years old. What am I missing? What experts should I follow, that provide and explain realistic warnings?
    Hello! So, I've recently watched some podcasts with AI experts talking about how will AI be the end of us all, or get unimaginably smarter than we can ever imagine - but I just don't see how could that happen, with my (limited) knowledge of ML models. Every argument is based on speculation along the lines of "The ML model already has an IQ of XXX, it will have 10x more eventually" or "the ML model can improve itself, so it may learn how to hack us and break free". But I don't see how anything like that could happen with the way how current ML algorithms and models work. However, my background with ML and AI is that I had a few classes during my Master's in IT few years ago, but it was not my field of study. I have an idea about how ML models such as Neural networks or Genetic algorithms …  ( 12 min )
    Who's in charge here?! How about ... us humans!
    Knowledgeable people have been warning about the dangers of unrestrained AI. Stephen Hawking, Donna Haraway, Bill Joy, Hans Moravec, and others warned us years ago. Now, ChatGPT and other AI tools threaten to make humans like you and me obsolete. Imagine a world where an identifiable, professionally licensed human handler is legally responsible for AI objects that are capable of mimicking a human being. That is exactly the concept of a professional license in the under-construction, revolutionary information infrastructure called Authentiverse, where objects of every sort will bear the digital signature of an individual, real human being who is ACCOUNTABLE for that object. https://preview.redd.it/crm4bmmpqw9b1.png?width=831&format=png&auto=webp&s=37730409284e86afd24b8d76474e47e2ca1bb854 submitted by /u/ckryptonite [link] [comments]  ( 8 min )
    Immersed in the Ever-Expanding AI Landscape: Unconventional Ways to Keep Pace with the Innovations
    Let's dive into the vast sea of AI innovations and discuss unconventional methods to stay updated with the latest tools and news. I don't know about you, but as an AI aficionado, I often find myself grappling with the overwhelming flood of captivating developments that AI brings to the table. Spending hours on Twitter or skimming through endless newsletters can leave me feeling like a kid in a candy store, afraid of missing out on the next big thing. So, how do you navigate this ever-changing landscape? Here are some offbeat approaches I've adopted to keep pace and feed my insatiable appetite for AI knowledge. I'd love to hear your unique strategies as well! Rediscovering the Forgotten Art of Coffee Shop Conversations: Step away from your screens for a moment and venture into the real…  ( 10 min )
    ChatGPT: Browse with Bing disabled temporarily
    OpenAI has temporarily disabled 'Browse with Bing' feature in ChatGPT. "We have learned that the ChatGPT Browse beta can occasionally display content in ways we don't want. For example, if a user specifically asks for a URL's full text, it might inadvertently fulfill this request. As of July 3, 2023, we’ve disabled the Browse with Bing beta feature out of an abundance of caution while we fix this in order to do right by content owners." [Source] Seems like it might be due to the way people were using it to access paywalled articles. ​ submitted by /u/wyem [link] [comments]  ( 8 min )
    One-Minute Daily AI News 7/3/2023
    Last week, Microsoft released the first public beta version of Windows 11, which includes the highly anticipated AI assistant, Copilot. This move marks Microsoft’s commitment to embracing AI across its products. Copilot, based on the GPT model, has already been integrated into various Microsoft products, such as Bing, Edge, Microsoft 365, Dynamic 365, and SharePoint.[1] Google AI introduces MediaPipe Diffusion plugins that enable controllable Text-To-Image generation on-device.[2] The U.N. Security Council will hold a first-ever meeting on the potential threats of AI to international peace and security, organized by the United Kingdom which sees tremendous potential but also major risks about AI’s possible use for example in autonomous weapons or in control of nuclear weapons.[3] Over the weekend, Google updated its privacy policy to allow the company to collect and analyze information people share online to train its AI models.[4] Sources: [1] https://www.breakinglatest.news/technology/microsoft-integrates-ai-assistant-copilot-into-windows-11-and-adds-new-features/ [2] https://www.marktechpost.com/2023/07/02/google-ai-introduces-mediapipe-diffusion-plugins-that-enable-controllable-text-to-image-generation-on-device/ [3] https://www.aol.com/un-council-hold-first-meeting-225519158.html [4] https://www.searchenginejournal.com/google-updates-privacy-policy-to-collect-public-data-for-ai-training/490715/#close submitted by /u/Excellent-Target-847 [link] [comments]  ( 8 min )
    Struggling with Local LLMs
    Hey guys, So my senior just discovered Local-LLMs and he is obsessed with setting up a local LLM to answer questions about personal documents sourced from diffrent platforms, DBs, PDFS, URLs etc. His idea is to pitch this to some client. From what I have been able to set up, gpt4all windows version (does not use GPU), GPT4All code version (Also not sure if it can use GPU) and private GPT, The time it takes for the LLM to answer questions and the accuracy both are not what would make a commerical product. Time is always > 30 seconds. Answers are also here and there, even on VMs that cost 600$ monthly to run. Now, there are new models being released every second, it seems. Yesterday I spent whole day trying to load the newest one MBT-30B on a p3 AWS EC-2 With Tesla v-100 16GB GPU. The GPU ran out of memory when loading it, the model itself is 30GB. whole day wasted. This has become sort of a wild goose chase and I have the feeling this is a waste of time, or there is something very basic I am probably not understanding? What do you guys suggest? submitted by /u/Assholefrmcoinexchan [link] [comments]  ( 8 min )
    Born In The USA
    submitted by /u/Only-Control5926 [link] [comments]  ( 8 min )
    I want help making an AI voice clone of myself so that I can overlay that onto a song
    I would need it to be free (which I know is a tough ask) so that I can overlay my ai voice over a song and I don't know how to do that and doing quick googling doesn't really help submitted by /u/Bengamezzzzzz [link] [comments]  ( 8 min )
  • Open

    "Learning starts" What does it mean and what is used for?
    Hi, I'm learning DRL and I'm using DDPG stablebaselines3 to implement an agent. I've been playing with the hyperparameters of the algorithm to understand what they are doing. However, I notice that "learning_starts" can significantly change the behavior of my results, but I don't understand why. I've read the definition that appears in the stablebaselines3 documentation, but I don't understand it: https://stable-baselines3.readthedocs.io/en/master/modules/ddpg.html Also, I found something that said that this parameter helps with the exploration-exploitation tradeoff, but my understanding is that for this problem the action noise is the parameter to tweak? Thank you! submitted by /u/fz_DRL_apprentice [link] [comments]  ( 8 min )
    Looking for help migrating an old example project to stable-baselines3
    I've been looking for a good code example that I could dig my teeth into to help me get a better handle on machine learning in python. I eventually ran across this: https://github.com/davidADSP/SIMPLE which is pretty much everything I wanted. The catch is, the project is not being actively maintained and is based around some obsolete libraries. Rather than go back to scouring the web, I decided to be stubborn and migrate the dependencies myself. I forked the repo and started working on the migration here: https://github.com/pbtura/SIMPLE It has taken me a few weeks, but I have most of the code in a state where it will at least compile and run without throwing errors. There is however one last roadblock that I just can't seem to crack. Each game has a custom model class that extends a stab…  ( 9 min )
    Any HRL/Meta-Action work that doesn’t use one-hot encoded abstracted action space?
    I am wondering if there are any works on continuous hierarchal/meta action spaces. To clarify, this is separate from the environment itself having a continuous actions space. If you could point me in the direction of any work/papers related to this that would be a great help. Thank you. submitted by /u/jms4607 [link] [comments]  ( 8 min )
    Uni of Alberta vs UCBerkeley vs Udacity Deep RL Course
    Hi, I want to do RL Courses with projects that I can add to my resume. Which of the following courses would be the best to work on:- UoA RL Course: https://www.coursera.org/specializations/reinforcement-learning UCB Deep RL Course: http://rail.eecs.berkeley.edu/deeprlcourse/ Udacity's Deep RL Course: https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893 More about me: I have a background in Robotics, deep learning for Computer Vision and a little NLP. I have worked with PyTorch and Tensorflow before. I currently work as a Computer Vision Engineer. submitted by /u/Pensive_Poetry [link] [comments]  ( 8 min )
  • Open

    Buy one, get one free
    The two latest blog post have elaborated on the fact that when a function satisfies a second order differential equation, once you’ve evaluated the function and its derivative you get the second derivative for free, or at least for cheap. Sometimes functions are so closely related that software libraries bundle them together. For example, in […] Buy one, get one free first appeared on John D. Cook.  ( 6 min )
    What good is a DE once you have the solution?
    In an undergraduate course on differential equations, you see an equation, you find a solution. Wax on, wax off. It would be reasonable to assume that the differential equation is simply an obstacle to overcome and that once you have the solution, the differential equation has done its job and can be discarded. It would […] What good is a DE once you have the solution? first appeared on John D. Cook.  ( 6 min )
    Halley’s variation on Newton’s method
    Newton’s method for solving f(x) = 0 converges quickly once it starts to converge. Specifically, it converges quadratically: the error is roughly squared at each step. This means that the number of correct decimal places doubles at each iteration of Newton’s method. Doubling the number of decimal places is nice, but what if we could […] Halley’s variation on Newton’s method first appeared on John D. Cook.  ( 6 min )

  • Open

    What is your favorite AI generated cover song so far?
    Wondering after seeing the post about Sinatra singin Gangsta’s Paradise. submitted by /u/ragipy [link] [comments]  ( 8 min )
    AI eliminated nearly 4,000 jobs in May, report says
    submitted by /u/facinabush [link] [comments]  ( 8 min )
    ChatGPT took their jobs. Now they walk dogs and fix air conditioners.
    submitted by /u/facinabush [link] [comments]  ( 8 min )
    Pondering the Next Frontier in AI
    I've recently been exploring SceneXplain, an AI tool that does image captioning and it got me wondering what's next for AI? Ai has already far exceeded expectations, it's revolutionizing early detection and personalized treatments, , AI is optimizing production and supply chains. It's even contributing to climate science. And with advancements in natural language understanding, we might soon have AI systems capable of nuanced social interactions. The possibilities are endless, and we’re probably at an era defining moment here. So, where do you see AI tech heading next? submitted by /u/emotional_clearing [link] [comments]  ( 8 min )
    An idea for watermarking source code IP on projects uploaded to GitHub
    I'm not going to lie, github is doing well for themselves, but I can't forgive them for the various examples I've seen where developers have seen GPT quote their code exactly from a private repository with almost no change, and of course, no attribution. I have though of ideas including attribution systems for LLMs and it seems Microsoft and others are on top of this... but, I can't help but still not trust any of these large corps. Even gitlab, however well they might do, might one day succumb to the larger corporations, and then what happens to any code I thought was private? So the idea is simple... create a database (decentralized or not is a political debate at this point), which issues unique identifiers and code patterns for your specific code, with a filter that randomly inserts …  ( 9 min )
    Do you think we're on the cusp of a technological revolution? Is AI about to affect just about every sector?
    Please tell me why or why not. I don't work in tech but am blown away by its capabilities. submitted by /u/ticketbroken [link] [comments]  ( 8 min )
    A Dog On Stage Rapping
    submitted by /u/Taylortro [link] [comments]  ( 8 min )
    Are there any AI projects that plays a game for you and learns?
    Open source* submitted by /u/Failed_cocacola [link] [comments]  ( 8 min )
    Any good AI tools for transmitting text (or speech) into a voice changer?
    Hello all, I am currently making a promotional video for my webcomic. Would like the narration to be in a norse/viking voice. Was looking for AI tools to help me with this. submitted by /u/backwoodnav [link] [comments]  ( 8 min )
    Has anyone used leetcode for training data yet?
    As a passing observer of the AI space it has always made sense to me that leetcode would be the best coding database out there. The quality and organization of the data seems perfect. Not old would there be more than enough problems, but each problem have multiple valid solutions, and those solutions would be organized by performance. I hear all the time that the stack and stack overflow are use for LLMs, but why not leetcode. This recently paper shows the value of smaller models trained on higher quality data, so it would make sense that leetcode would be the best right? https://youtu.be/7S68y6huEpU submitted by /u/TrainquilOasis1423 [link] [comments]  ( 8 min )
    Dalle 2 vs Stable Diffusion XL 0.9
    Exact same prompt, surprisingly similar results. I remember a few months ago, when Dalle 2 was released, that Stable Diffusion was miles behind it. Now I'd say I actually prefer SD results over the Dalle 2 ones. submitted by /u/Chocolarion [link] [comments]  ( 8 min )
    Can you describe where and how Artificial Intelligence applications appear on an average day in your life?
    I'm interested in what your answer to this question would be. ^^ submitted by /u/1-1-3-1-1-4 [link] [comments]  ( 8 min )
    Needing some smart people’s insight into chatgpt…
    Need some big brain help Long story short, I’m in the construction/renovation industry, looking at trying to find ways to utilize chatgpt, it’s plugins and additional resources to be able to input blueprints, plans, scopes or work, specs and various information to attempt to generate compiled information, accurate material take offs, estimates and procedures within each project… am I getting ahead of myself or is this possible, if so, where should I look at starting? submitted by /u/asduck86 [link] [comments]  ( 8 min )
    Our first update on our Generative Agent NPCs... It didn't go smoothly 😆 Update 1
    submitted by /u/Chance_Confection_37 [link] [comments]  ( 8 min )
    Help me with an AI Quiz
    I am making a quiz for people in my company with the topic. Ridiculous thing made/not made by AI. Where the people participating have to guess if whatever we show them was made by AI or not. If you have any questions/ suggestions please lemme know. I'd appreciate it ;) submitted by /u/abhinav_sk [link] [comments]  ( 8 min )
    Nvidia’s H100: Funny L2, and Tons of Bandwidth
    submitted by /u/bartturner [link] [comments]  ( 8 min )
    AI singing is getting wild
    submitted by /u/Kingkrool1994 [link] [comments]  ( 8 min )
    What to learn about to understand potential limitations of AI?
    As much as I’m rooting for AGI to come in and help (potentially) fix a lot of the problems the human race has, I’d like to look more in depth to the other side of the story. I want to make it clear that I make no claims to be super knowledgeable about anything I’m saying here. I genuinely just want to learn more. It seems the commonly held belief foundation in this space is that number theory, computational theory, information theory, and whatever else related to computer science, results in this direction that we just need enough compute and the right algorithm to simulate an intelligence that surpasses the general intelligence of us humans. I know this is still up for debate and nothing is set in stone etc. From what I know about philosophy (which is also debated ofc), all science is based on empiricism, and if there is something we end up not being able to measure, then we will not be able to control or deal with that thing using science. Also, materialism and physicalism come to mind, there are obviously alternative beliefs to those. The hard problem of consciousness comes to mind. Is there something special to flesh? So, somewhere, there should be interesting discussion and analysis of what’s going on with AI, and therefore counters to the claims it can ever truly surpass our intelligence. I know there’s a lot of philosophy, but I’d prefer secondary sources since it can get pretty hard to digest. Counterpoints that directly address AI the closer to modern appreciated just as much as old philosophy, I want as much perspective as possible. Also, good faith arguments without hate for the other side is preferred. The whole recent art community/tech controversy is just exhausting. I just want to be happy learning about the implications, not argumentative nor embittered! Thanks if you read my post. If this isn’t the right place to ask, I’d appreciate being pointed in a better direction. submitted by /u/AdaptivePerfection [link] [comments]  ( 9 min )
  • Open

    [D] Havent worked in 1.5 years, how to prepare to reenter field?
    I quit my job as a Deep Learning Researcher at an AI Research Lab at a top hardware manufacturer a year and a half ago due to mental health issues. I’ve spent the time off strictly working on myself, no papers have been read, no code has been written, and I even bought an ATI GPU for gaming (heresy, I know). I’m finally getting to a place where I feel ready to rejoin the field and was wondering about thoughts on how to go about it. I spent 4.5 years at the lab, focused on computer vision in geospatial imagery, did some NLP, and some application development (mostly for PoCs, or non-profit collaborations). Published papers at CVPR, some journals, ran workshops at conferences, and led projects around AI for Good. Used a lot of PyTorch, TF, Kubernetes, Docker, etc. What would you do to touch base with the field and get ready to start interviewing for a position focused on deep learning — engineering or research (I’m gonna try for both and see what company is a better fit). I clearly need to go read up on ChatGPT, was gonna go through CVPR and NeurIPS papers, and was planning on reviewing my DL textbook as well as Hastie/Tibshirani, but I’m very much so out of the loop when it comes to current interview questions/trends in the field. What would you do? submitted by /u/chonbas [link] [comments]  ( 9 min )
    [D] Clustering time series data
    Hi, I am trying to cluster time series data (from 2 and 10 timepoints per series) and I am completely new to the field. I've just found the package "tslearn" which seem to be a suitable tool to accomplish the aforementioned task. As ar as I understood, tslearn is able to perform both clustering and dynamic time warping at the same time. However I am not entirely sure if it can handle missing data, and I would really like to avoid imputation of missing data in my project. Does anyone know if tslearn.clustering is able to deal with missing data? Also, are you aware on any alternative packages that I can use? Thanks in advance submitted by /u/_luqui [link] [comments]  ( 8 min )
    [P] testing the waters for a capable co-founder to help me finish development of an AI matchmaker
    Testing the waters, to see if I can find the right person to help me finish building. buzr.org Buzr's an ai matchmaker, but with some very weird twists, such as an onboarding process that happens by enabling the user to talk to there favorite hero/celeb. Similar to how (banterai.app) was made. Buzr's meant to be accessible to everyone from the start. Currently undecided whether Buzr will end up being open source and zero cost to the user or for profit but with an insanely affordable price point that ensures just about anyone who wants to participate is able to. I'm almost finished with the backend (including the first iteration of training an open source model from HF). Honestly I have somewhere around 70 percent of work done needed to launch. Definitely a few things including the front end that needs some work. I’m still not entirely sure i’m going to be able to find the right person here, but I figure i’ll give it a shot. Ideally someone full stack (web) with some kind of ML background. [zack@humanity.rocks](mailto:zack@humanity.rocks) Twitter: zalehm buzr.org humanity.rocks submitted by /u/kingofthedoodles1 [link] [comments]  ( 9 min )
    What’s Your Problem? An Oft-Missing Section in AI Papers
    submitted by /u/tomssilver [link] [comments]  ( 8 min )
    [Research] Does anyone know a dataset which contains images of many invoices, and the associated text?
    I'm looking for a large amount of random invoices in various formats for some OCR and formatting recognition. submitted by /u/bettercallaCPA [link] [comments]  ( 8 min )
    [P] faster latent diffusion
    This is a method I thought of to potentially make diffusion-type models faster. The theoretical justification is the same as for "consistency models" but was developed independently. It works OK for MNIST, but that doesn't mean much. I don't have as many GPUs as OpenAI or MidJourney. - abbreviations NN = neural network LS = latent space background on diffusion NN autoencoders can be trained to convert between images and a LS where distance corresponds to image similarity. Like how most images are just "noise", most of that LS does not correspond to meaningful images. For simpler explanation, let's consider a simplified diffusion-based image generation model. It has a 2-dimensional LS, and 2 image categories: cats and dogs. https://preview.redd.it/9n9a89qkps9b1.png?width=900&for…  ( 10 min )
    [D]For Super resolution task with Diffusion, aren't the resolutions of input and output different?
    In the case of unconditional image synthesis using diffusion and training a U-Net, the input image's resolution and the resolution of the output image generated by the U-Net are generally the same. However, when training for the super-resolution task, the input image's resolution and the output image's resolution from the U-Net are different. How is this achieved? Is the final convolution layer of the U-Net adjusted? submitted by /u/ConfidentSprinkles54 [link] [comments]  ( 8 min )
    [D] When less is more in the hierarchical forecasting case.
    ​ https://preview.redd.it/b6zg3g55ms9b1.jpg?width=500&format=pjpg&auto=webp&s=df07e2d934594c2901703d2ce413b79a43f81c91 There are many cases when simplicity is preferred over unnecessary complications. This was our experience in our ICML paper on hierarchically coherent networks for constrained probabilistic forecasting (HINT, https://arxiv.org/abs/2305.07089). We explored the accuracy of the latest AI-based hierarchical solutions, including AWS' HierE2E, and discovered that the Vector Autoregressive (VAR) approaches showed modest improvements or deteriorations over reconciled ARIMA. With our simplified HINT approach, we moved away from the multivariate inputs and found 13 percent accuracy improvements over existing solutions (MinTrace, PERMBU, HierE2E) in four hierarchical datasets: https://preview.redd.it/jw6k0we4ls9b1.png?width=1684&format=png&auto=webp&s=bd2689b66421b132473adfae078d989c8eba7196 Fitting Vector Autoregressive models for the hierarchical forecast task can be challenging as the models' parameters grow fast, and spurious correlations characterize the setting. As always, let us know your thoughts.Experiments here, a Github star ⭐ is appreciated: https://nixtla.github.io/neuralforecast/examples/hierarchicalnetworks.html https://github.com/Nixtla/hierarchicalforecast/tree/main/experiments/hierarchical_baselines submitted by /u/Kinferatttu [link] [comments]  ( 8 min )
    [D] Machine Learning C books
    Are there any modern books on machine learning utilizing C? I know there's ton that are older , but I'd prefer something more recent. Majority of the books I've come across are for python, and of course the principles ,etc still apply but I want to do it in C, although I'm comfortable with python. Thanks for any help, and sorry if this question has been asked in the past. submitted by /u/_W0z [link] [comments]  ( 8 min )
    [R] Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing
    Project page: https://arielnlee.github.io/PatchMixing/ Twitter thread: https://twitter.com/natanielruizg/status/1675899509396631555?s=20 Paper: https://arxiv.org/abs//2306.17848 Transformers vs. ConvNets Abstract Vision transformers (ViTs) have significantly changed the computer vision landscape and have periodically exhibited superior performance in vision tasks compared to convolutional neural networks (CNNs). Although the jury is still out on which model type is superior, each has unique inductive biases that shape their learning and generalization performance. For example, ViTs have interesting properties with respect to early layer non-local feature dependence, as well as self-attention mechanisms which enhance learning flexibility, enabling them to ignore out-of-context image information more effectively. We hypothesize that this power to ignore out-of-context information (which we name patch selectivity), while integrating in-context information in a non-local manner in early layers, allows ViTs to more easily handle occlusion. In this study, our aim is to see whether we can have CNNs simulate this ability of patch selectivity by effectively hardwiring this inductive bias using Patch Mixing data augmentation, which consists of inserting patches from another image onto a training image and interpolating labels between the two image classes. Specifically, we use Patch Mixing to train state-of-the-art ViTs and CNNs, assessing its impact on their ability to ignore out-of-context patches and handle natural occlusions. We find that ViTs do not improve nor degrade when trained using Patch Mixing, but CNNs acquire new capabilities to ignore out-of-context information and improve on occlusion benchmarks, leaving us to conclude that this training method is a way of simulating in CNNs the abilities that ViTs already possess. We will release our Patch Mixing implementation and proposed datasets for public use. ​ submitted by /u/StrawberryNumberNine [link] [comments]  ( 9 min )
    [P] Transferring thoughts to Chatbot - How would you build it ?
    I am building a “Conversational chatbot” trained on my thoughts (just as an experiment) and want to know how would you do it. The idea is to have a chatbot in few years that has: My views on different topics, books I’ve read and reading where I summarise chapters and how I felt throughout the progress, what was the stages of building my startup etc. Everything that is happening now and for the next 18 months. It’s like a diary where you can converse with yourself after set amount of time. Let me know how would you go about building this? How complicated or simple would you make it? Technologies etc ? It would fun to discuss more, I guess. submitted by /u/DragonflyLatter9068 [link] [comments]  ( 8 min )
    [R] Chat with documents using LangChain and OpenAI
    Over the past few months, I've been captivated by the flood of apps claiming to be the ultimate "ChatGPT for your documents" on Product Hunt. The question that lingered in my mind was, "How do these apps actually work?" Curiosity led me down an exciting path of discovery, and I stumbled upon a framework that I think is revolutionizing the world of app development in the context of Large Language Models - LangChain I learned that developing a "ChatGPT for your documents" is easily achievable through three broad workflows combined with access to OpenAI API. In fact, I went ahead and prototyped such a system on streamlit. Step 1 - The Setup: Store your documents as embeddings. In the first step, use Document Loaders (at least 100 are available), provided by LangChain to convert anything fr…  ( 9 min )
    [D] Is there a difference between p-tuning and prefix tuning ?
    If I understood correctly, p-tuning (from "GPT understand, too") and Prefix-tuning (from "Prefix-Tuning: Optimizing Continuous Prompts for Generation") are mostly the same idea - to learn continuous prompting via backpropagation. I've seen some differences (like Prefix-tuning seems to be, well only "prefixs" - also p-tuning seems to use a "model" to generate the embeddings for the prompt); but mostly they seem very similar. Is that true ? Do you know why in https://huggingface.co/docs/peft/index they did implementate all ? ​ Thank in advance ! submitted by /u/ez613 [link] [comments]  ( 8 min )
    [D] What is currently the best theoretical book (or notes) about Convolutional Neural Networks?
    I have a degree in maths and i am interested in learning about Convolutional Neural Networks from a mathematical/ theory point of view. Can someone point me towards some resource. I already have a basic knowledge of Convolutional Neural Networks (and other Neural Networks) submitted by /u/Wonderful_Energy_15 [link] [comments]  ( 8 min )
    [R] Observing Stable Results with Transfer Learning
    I have encountered an interesting phenomenon while working with StyleGan2ada. Initially, I trained a model, observing that it "diverged". I say "diverged" because the generator seems to use repetitive colors in all synthetics, nothing special, but it is noticeable when I generate large quantities of images. To investigate this further, I decided to train a new model using a previous snapshot of the "diverged" model as a starting point. However, to my surprise, the new model, trained with transfer learning, appears to exhibit stable behavior without any signs of divergence. I am curious to understand why this discrepancy occurs. Could the initial model state or the transfer learning process itself play a role in preventing the divergence? Are there any underlying factors that I might be missing when training a new model in this manner? Any insights or suggestions would be greatly appreciated. submitted by /u/Minute-Session8703 [link] [comments]  ( 8 min )
    [D] Starting MSc AI September, unsure what to do
    Hi folks, I'm about to start an MSc in AI, here's the course structure: https://www.kcl.ac.uk/study/postgraduate-taught/courses/artificial-intelligence-msc I've been studying the classic stuff at my own pace like Andrew's Coursera ML courses, and a bit of fast.ai I was also offered a place for a free 3 months bootcamp in Data Analyst that finishes exactly before starting uni. Hence I've been thinking if I should do this bootcamp or rather study on my own before the MSc. What do you guys recommend? submitted by /u/Wise_Technician_8475 [link] [comments]  ( 8 min )
    [D] Best embedding models for retrieving mixed natural language and source code?
    I'm building a semantic search for a technical mailing list where emails contain natural language and source code (short scripts and long patches). On the first try, I used OpenAI's embeddings (ada-002), and the results are amazing, but a free alternative that I can use locally without rate limits or a credit card would be great. Are there other embedding models that do not perform significantly worse in terms of quality on such a dataset? I have seen the MTEB Leaderboard but it does not seem to cover source code data. Thanks in advance! submitted by /u/LinqLover [link] [comments]  ( 8 min )
    [D] To all the machine learning engineers: most difficult model task/type you’ve ever had to work with?
    Got to say for me it was putting an object detection model to use in videos in 2021. Figuring out how to create a custom dataset, train a model on it without messing some super obscure parameter in the config up, and converting the weights to ONNX. submitted by /u/After_Magician_8438 [link] [comments]  ( 8 min )
    [P] Sweep: Open Source AI junior developer that writes and fixes it's own pull requests
    Hi! I'm one of the developers of Sweep AI (YC S23). You can tell Sweep about the bugs and feature requests and Sweep will create a PR with code changes. You can use Sweep Chat to ideate and clarify requirements with Sweep, eventually creating a pull request. Then anywhere GitHub takes text (PR comments, code comments), Sweep can read it and tweak the PR. We built Sweep by integrating search + GPT4-32k into Github, with a couple more tricks :D. To find out more: Get started with Sweep at https://github.com/sweepai/sweep#-getting-started See more details at https://sweep.dev/ and our docs at https://docs.sweep.dev/ submitted by /u/williamsweep [link] [comments]  ( 8 min )
    [P] LongFormer for large document summerization
    Does anyone here have expirence in training a Longformer model for large document summerization? I am working on a project and after doing a solid amount of research i've decided to train a Longformer model to summerize large document for a specific industry. I was wondering if anyone has had expirence with using it and the hyperparameters. I have not started the training just yet and am in the process of converting the documents and tokenizing them. ​ submitted by /u/Xeranter [link] [comments]  ( 8 min )
    [D] Vertex AI Endpoint Alternative
    Vertex AI Endpoint Alternative Hello everyone! New to the ML world. I recently started using Vertex AI and AutoML for some personal projects and was amazed at how simple and effective their AutoML feature worked. It accomplished what I need it to do with Text Classifications. The problem came in when leveraging the Model Endpoint to perform API predictions against the trained Model is $5 per 1000 characters. This country very costly depending on if there are various requests a day for example or to build up on that. Is there an alternative to this whole scenario or a way to leverage my own endpoint that's cheaper to run prediction API requests? submitted by /u/ywb_win [link] [comments]  ( 8 min )
    [R] Classifier-Free Guidance can be applied to LLMs too. It generally gives results of a model twice the size you apply it to. New SotA on LAMBADA with LLaMA-7B over PaLM-540B and plenty other experimental results.
    submitted by /u/Affectionate-Fish241 [link] [comments]  ( 8 min )
  • Open

    Three advantages of non-AI models
    It’s hard to imagine doing anything like Midjourney’s image generation without neural networks. The same is true of ChatGPT’s text generation. But a lot of business tasks do not require AI, and in fact would be better off not using AI. Three reasons why: Statistical models are easier to interpret. When a model has a […] Three advantages of non-AI models first appeared on John D. Cook.  ( 5 min )
    Eccentricity of bivariate normal level sets
    Suppose you have a bivariate normal distribution with correlation ρ where Then the level sets of the density function are ellipses, and the eccentricity e of the ellipses is related to the correlation ρ by Plots For example, suppose ρ = 0.8. Here’s a plot of the density function f(x, y). And here is a […] Eccentricity of bivariate normal level sets first appeared on John D. Cook.  ( 5 min )
  • Open

    AI, Digital Twins to Unleash Next Wave of Climate Research Innovation
    AI and accelerated computing will help climate researchers achieve the miracles they need to achieve breakthroughs in climate research, NVIDIA founder and CEO Jensen Huang said during a keynote Monday at the Berlin Summit for the Earth Virtualization Engines initiative. “Richard Feynman once said that ‘what I can’t create, I don’t understand’ and that’s the Read article >  ( 6 min )
  • Open

    Retain original PDF formatting to view translated documents with Amazon Textract, Amazon Translate, and PDFBox
    Companies across various industries create, scan, and store large volumes of PDF documents. In many cases, the content is text-heavy and often written in a different language and requires translation. To address this, you need an automated solution to extract the contents within these PDFs and translate them quickly and cost-efficiently. Many businesses have diverse […]  ( 8 min )
    Auto-labeling module for deep learning-based Advanced Driver Assistance Systems on AWS
    In computer vision (CV), adding tags to identify objects of interest or bounding boxes to locate the objects is called labeling. It’s one of the prerequisite tasks to prepare training data to train a deep learning model. Hundreds of thousands of work hours are spent generating high-quality labels from images and videos for various CV […]  ( 9 min )
  • Open

    The Golden Rule and the AI Utility Function – Part I
    “Do unto others as you would have them do unto you.” The Golden Rule.  This may have been the first Sunday School lesson I learned (thanks, Mrs. Monroe). The Golden Rule is a moral principle that states that you should treat others as you would like others to treat you. It is a foundational principle… Read More »The Golden Rule and the AI Utility Function – Part I The post The Golden Rule and the AI Utility Function – Part I appeared first on Data Science Central.  ( 22 min )
  • Open

    Interpreting loss curves & returns in DDQN
    I'm training a DDQN agent to play a simple video game based on pixel inputs. Cumulative returns (blue, left) and predicted Q-values on a hold-out set (right, green) increase over the course of training (as expected). However, I don't understand the progression of the loss (centre, red). Can you help me make sense of the loss function? The reason I'm confused is that: With policy-learning, I'm used to tracking only the returns and (some measure of) validation performance because the loss function doesn't really track returns. However with Q-learning, the loss is simply the distance between the predicted and bootstrapped state values. Both are optimised over the course of learning. But shouldn't their estimates converge over time, and therefore the loss should go down? Thanks for your help! Average per-episode (a) returns and (b) training loss, and (c) Q-value predicted on hold-out set. Each is measured at the end of every episode during training. submitted by /u/desperateEfforts1 [link] [comments]  ( 8 min )
    Stable baselines! Where my people at?
    Hello. I think this is a great library. I am actively using it some research. I have questions about callbacks and I want to learn more so I can make some extensions. Is there a community? The github page mentions this sub and discord. submitted by /u/Independent-Scale564 [link] [comments]  ( 8 min )
  • Open

    The Bayesian Learning Rule. (arXiv:2107.04562v3 [stat.ML] UPDATED)
    We show that many machine-learning algorithms are specific instances of a single algorithm called the Bayesian learning rule. The rule, derived from Bayesian principles, yields a wide-range of algorithms from fields such as optimization, deep learning, and graphical models. This includes classical algorithms such as ridge regression, Newton's method, and Kalman filter, as well as modern deep-learning algorithms such as stochastic-gradient descent, RMSprop, and Dropout. The key idea in deriving such algorithms is to approximate the posterior using candidate distributions estimated by using natural gradients. Different candidate distributions result in different algorithms and further approximations to natural gradients give rise to variants of those algorithms. Our work not only unifies, generalizes, and improves existing algorithms, but also helps us design new ones.
    Cooperative Multi-Agent Deep Reinforcement Learning for Reliable and Energy-Efficient Mobile Access via Multi-UAV Control. (arXiv:2210.00945v2 [cs.LG] UPDATED)
    This paper addresses a novel multi-agent deep reinforcement learning (MADRL)-based positioning algorithm for multiple unmanned aerial vehicles (UAVs) collaboration (i.e., UAVs work as mobile base stations). The primary objective of the proposed algorithm is to establish dependable mobile access networks for cellular vehicle-to-everything (C-V2X) communication, thereby facilitating the realization of high-quality intelligent transportation systems (ITS). The reliable mobile access services can be achieved in following two ways, i.e., i) energy-efficient UAV operation and ii) reliable wireless communication services. For energy-efficient UAV operation, the reward of our proposed MADRL algorithm contains the features for UAV energy consumption models in order to realize efficient operations. Furthermore, for reliable wireless communication services, the quality of service (QoS) requirements of individual users are considered as a part of rewards and 60GHz mmWave radio is used for mobile access. This paper considers the 60GHz mmWave access for utilizing the benefits of i) ultra-wide-bandwidth for multi-Gbps high-speed communications and ii) high-directional communications for spatial reuse that is obviously good for densely deployed users. Lastly, the comprehensive and data-intensive performance evaluation of the proposed MADRL-based algorithm for multi-UAV positioning is conducted in this paper. The results of these evaluations demonstrate that the proposed algorithm outperforms other existing algorithms.
    Sphere2Vec: A General-Purpose Location Representation Learning over a Spherical Surface for Large-Scale Geospatial Predictions. (arXiv:2306.17624v1 [cs.CV])
    Generating learning-friendly representations for points in space is a fundamental and long-standing problem in ML. Recently, multi-scale encoding schemes (such as Space2Vec and NeRF) were proposed to directly encode any point in 2D/3D Euclidean space as a high-dimensional vector, and has been successfully applied to various geospatial prediction and generative tasks. However, all current 2D and 3D location encoders are designed to model point distances in Euclidean space. So when applied to large-scale real-world GPS coordinate datasets, which require distance metric learning on the spherical surface, both types of models can fail due to the map projection distortion problem (2D) and the spherical-to-Euclidean distance approximation error (3D). To solve these problems, we propose a multi-scale location encoder called Sphere2Vec which can preserve spherical distances when encoding point coordinates on a spherical surface. We developed a unified view of distance-reserving encoding on spheres based on the DFS. We also provide theoretical proof that the Sphere2Vec preserves the spherical surface distance between any two points, while existing encoding schemes do not. Experiments on 20 synthetic datasets show that Sphere2Vec can outperform all baseline models on all these datasets with up to 30.8% error rate reduction. We then apply Sphere2Vec to three geo-aware image classification tasks - fine-grained species recognition, Flickr image recognition, and remote sensing image classification. Results on 7 real-world datasets show the superiority of Sphere2Vec over multiple location encoders on all three tasks. Further analysis shows that Sphere2Vec outperforms other location encoder models, especially in the polar regions and data-sparse areas because of its nature for spherical surface distance preservation. Code and data are available at https://gengchenmai.github.io/sphere2vec-website/.
    Sufficient-Statistic Memory AMP. (arXiv:2112.15327v4 [cs.IT] UPDATED)
    Approximate message passing (AMP) type algorithms have been widely used in the signal reconstruction of certain large random linear systems. A key feature of the AMP-type algorithms is that their dynamics can be correctly described by state evolution. While state evolution is a useful analytic tool, its convergence is not guaranteed. To solve the convergence problem of the state evolution of AMP-type algorithms in principle, this paper proposes a sufficient-statistic memory AMP (SS-MAMP) algorithm framework under the conditions of right-unitarily invariant sensing matrices, Lipschitz-continuous local processors and the sufficient-statistic constraint (i.e., the current message of each local processor is a sufficient statistic of the signal vector given the current and all preceding messages). We show that the covariance matrices of SS-MAMP are L-banded and convergent, which is an optimal framework (from the local MMSE/LMMSE perspective) for AMP-type algorithms given the Lipschitz-continuous local processors. Given an arbitrary MAMP, we can construct an SS-MAMP by damping, which not only ensures the convergence of the state evolution, but also preserves the orthogonality, i.e., its dynamics can be correctly described by state evolution. As a byproduct, we prove that the Bayes-optimal orthogonal/vector AMP (BO-OAMP/VAMP) is an SS-MAMP. As an example, we construct a sufficient-statistic Bayes-optimal MAMP (SS-BO-MAMP) whose state evolution converges to the minimum (i.e., Bayes-optimal) mean square error (MSE) predicted by replica methods when it has a unique fixed point. In addition, the MSE of SS-BO-MAMP is not worse than the original BO-MAMP. Finally, simulations are provided to support the theoretical results.
    DisPlacing Objects: Improving Dynamic Vehicle Detection via Visual Place Recognition under Adverse Conditions. (arXiv:2306.17536v1 [cs.CV])
    Can knowing where you are assist in perceiving objects in your surroundings, especially under adverse weather and lighting conditions? In this work we investigate whether a prior map can be leveraged to aid in the detection of dynamic objects in a scene without the need for a 3D map or pixel-level map-query correspondences. We contribute an algorithm which refines an initial set of candidate object detections and produces a refined subset of highly accurate detections using a prior map. We begin by using visual place recognition (VPR) to retrieve a reference map image for a given query image, then use a binary classification neural network that compares the query and mapping image regions to validate the query detection. Once our classification network is trained, on approximately 1000 query-map image pairs, it is able to improve the performance of vehicle detection when combined with an existing off-the-shelf vehicle detector. We demonstrate our approach using standard datasets across two cities (Oxford and Zurich) under different settings of train-test separation of map-query traverse pairs. We further emphasize the performance gains of our approach against alternative design choices and show that VPR suffices for the task, eliminating the need for precise ground truth localization.
    Expressive architectures enhance interpretability of dynamics-based neural population models. (arXiv:2212.03771v4 [q-bio.NC] UPDATED)
    Artificial neural networks that can recover latent dynamics from recorded neural activity may provide a powerful avenue for identifying and interpreting the dynamical motifs underlying biological computation. Given that neural variance alone does not uniquely determine a latent dynamical system, interpretable architectures should prioritize accurate and low-dimensional latent dynamics. In this work, we evaluated the performance of sequential autoencoders (SAEs) in recovering latent chaotic attractors from simulated neural datasets. We found that SAEs with widely-used recurrent neural network (RNN)-based dynamics were unable to infer accurate firing rates at the true latent state dimensionality, and that larger RNNs relied upon dynamical features not present in the data. On the other hand, SAEs with neural ordinary differential equation (NODE)-based dynamics inferred accurate rates at the true latent state dimensionality, while also recovering latent trajectories and fixed point structure. Ablations reveal that this is mainly because NODEs (1) allow use of higher-capacity multi-layer perceptrons (MLPs) to model the vector field and (2) predict the derivative rather than the next state. Decoupling the capacity of the dynamics model from its latent dimensionality enables NODEs to learn the requisite low-D dynamics where RNN cells fail. Additionally, the fact that the NODE predicts derivatives imposes a useful autoregressive prior on the latent states. The suboptimal interpretability of widely-used RNN-based dynamics may motivate substitution for alternative architectures, such as NODE, that enable learning of accurate dynamics in low-dimensional latent spaces.
    Fact or Artifact? Revise Layer-wise Relevance Propagation on various ANN Architectures. (arXiv:2302.12317v2 [cs.LG] UPDATED)
    Layer-wise relevance propagation (LRP) is a widely used and powerful technique to reveal insights into various artificial neural network (ANN) architectures. LRP is often used in the context of image classification. The aim is to understand, which parts of the input sample have highest relevance and hence most influence on the model prediction. Relevance can be traced back through the network to attribute a certain score to each input pixel. Relevance scores are then combined and displayed as heat maps and give humans an intuitive visual understanding of classification models. Opening the black box to understand the classification engine in great detail is essential for domain experts to gain trust in ANN models. However, there are pitfalls in terms of model-inherent artifacts included in the obtained relevance maps, that can easily be missed. But for a valid interpretation, these artifacts must not be ignored. Here, we apply and revise LRP on various ANN architectures trained as classifiers on geospatial and synthetic data. Depending on the network architecture, we show techniques to control model focus and give guidance to improve the quality of obtained relevance maps to separate facts from artifacts.
    GRIL: A $2$-parameter Persistence Based Vectorization for Machine Learning. (arXiv:2304.04970v2 [cs.LG] UPDATED)
    $1$-parameter persistent homology, a cornerstone in Topological Data Analysis (TDA), studies the evolution of topological features such as connected components and cycles hidden in data. It has been applied to enhance the representation power of deep learning models, such as Graph Neural Networks (GNNs). To enrich the representations of topological features, here we propose to study $2$-parameter persistence modules induced by bi-filtration functions. In order to incorporate these representations into machine learning models, we introduce a novel vector representation called Generalized Rank Invariant Landscape (GRIL) for $2$-parameter persistence modules. We show that this vector representation is $1$-Lipschitz stable and differentiable with respect to underlying filtration functions and can be easily integrated into machine learning models to augment encoding topological features. We present an algorithm to compute the vector representation efficiently. We also test our methods on synthetic and benchmark graph datasets, and compare the results with previous vector representations of $1$-parameter and $2$-parameter persistence modules. Further, we augment GNNs with GRIL features and observe an increase in performance indicating that GRIL can capture additional features enriching GNNs. We make the complete code for the proposed method available at https://github.com/soham0209/mpml-graph.
    Online Learning with Set-Valued Feedback. (arXiv:2306.06247v2 [cs.LG] UPDATED)
    We study a variant of online multiclass classification where the learner predicts a single label but receives a \textit{set of labels} as feedback. In this model, the learner is penalized for not outputting a label contained in the revealed set. We show that unlike online multiclass learning with single-label feedback, deterministic and randomized online learnability are \textit{not equivalent} even in the realizable setting with set-valued feedback. Accordingly, we give two new combinatorial dimensions, named the Set Littlestone and Measure Shattering dimension, that tightly characterize deterministic and randomized online learnability respectively in the realizable setting. In addition, we show that the Measure Shattering dimension tightly characterizes online learnability in the agnostic setting. Finally, we show that practical learning settings like online multilabel ranking, online multilabel classification, and online interval learning are specific instances of our general framework.
    UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers. (arXiv:2301.13741v3 [cs.CV] UPDATED)
    Real-world data contains a vast amount of multimodal information, among which vision and language are the two most representative modalities. Moreover, increasingly heavier models, \textit{e}.\textit{g}., Transformers, have attracted the attention of researchers to model compression. However, how to compress multimodal models, especially vison-language Transformers, is still under-explored. This paper proposes the \textbf{U}nified and \textbf{P}r\textbf{o}gressive \textbf{P}runing (\textbf{\emph{UPop}}) as a universal vison-language Transformer compression framework, which incorporates 1) unifiedly searching multimodal subnets in a continuous optimization space from the original model, which enables automatic assignment of pruning ratios among compressible modalities and structures; 2) progressively searching and retraining the subnet, which maintains convergence between the search and retrain to attain higher compression ratios. Experiments on various tasks, datasets, and model architectures demonstrate the effectiveness and versatility of the proposed UPop framework. The code is available at https://github.com/sdc17/UPop.
    First-order ANIL learns linear representations despite misspecified latent dimension. (arXiv:2303.01335v2 [cs.LG] UPDATED)
    Due to its empirical success in few-shot classification and reinforcement learning, meta-learning has recently received significant interest. Meta-learning methods leverage data from previous tasks to learn a new task in a sample-efficient manner. In particular, model-agnostic methods look for initialisation points from which gradient descent quickly adapts to any new task. Although it has been empirically suggested that such methods perform well by learning shared representations during pretraining, there is limited theoretical evidence of such behavior. More importantly, it has not been rigorously shown that these methods still learn a shared structure, despite architectural misspecifications. In this direction, this work shows, in the limit of an infinite number of tasks, that first-order ANIL with a linear two-layer network architecture successfully learns linear shared representations. This result even holds with a misspecified network parameterisation; having a width larger than the dimension of the shared representations results in an asymptotically low-rank solution. The learnt solution then yields a good adaptation performance on any new task after a single gradient step. Overall this illustrates how well model-agnostic methods such as first-order ANIL can learn shared representations.
    On the Existence of a Complexity in Fixed Budget Bandit Identification. (arXiv:2303.09468v2 [stat.ML] UPDATED)
    In fixed budget bandit identification, an algorithm sequentially observes samples from several distributions up to a given final time. It then answers a query about the set of distributions. A good algorithm will have a small probability of error. While that probability decreases exponentially with the final time, the best attainable rate is not known precisely for most identification tasks. We show that if a fixed budget task admits a complexity, defined as a lower bound on the probability of error which is attained by the same algorithm on all bandit problems, then that complexity is determined by the best non-adaptive sampling procedure for that problem. We show that there is no such complexity for several fixed budget identification tasks including Bernoulli best arm identification with two arms: there is no single algorithm that attains everywhere the best possible rate.
    Precision Anti-Cancer Drug Selection via Neural Ranking. (arXiv:2306.17771v1 [cs.LG])
    Personalized cancer treatment requires a thorough understanding of complex interactions between drugs and cancer cell lines in varying genetic and molecular contexts. To address this, high-throughput screening has been used to generate large-scale drug response data, facilitating data-driven computational models. Such models can capture complex drug-cell line interactions across various contexts in a fully data-driven manner. However, accurately prioritizing the most sensitive drugs for each cell line still remains a significant challenge. To address this, we developed neural ranking approaches that leverage large-scale drug response data across multiple cell lines from diverse cancer types. Unlike existing approaches that primarily utilize regression and classification techniques for drug response prediction, we formulated the objective of drug selection and prioritization as a drug ranking problem. In this work, we proposed two neural listwise ranking methods that learn latent representations of drugs and cell lines, and then use those representations to score drugs in each cell line via a learnable scoring function. Specifically, we developed a neural listwise ranking method, List-One, on top of the existing method ListNet. Additionally, we proposed a novel listwise ranking method, List-All, that focuses on all the sensitive drugs instead of the top sensitive drug, unlike List-One. Our results demonstrate that List-All outperforms the best baseline with significant improvements of as much as 8.6% in hit@20 across 50% test cell lines. Furthermore, our analyses suggest that the learned latent spaces from our proposed methods demonstrate informative clustering structures and capture relevant underlying biological features. Moreover, our comprehensive empirical evaluation provides a thorough and objective comparison of the performance of different methods (including our proposed ones).
    See Through the Fog: Curriculum Learning with Progressive Occlusion in Medical Imaging. (arXiv:2306.15574v2 [cs.CV] UPDATED)
    In recent years, deep learning models have revolutionized medical image interpretation, offering substantial improvements in diagnostic accuracy. However, these models often struggle with challenging images where critical features are partially or fully occluded, which is a common scenario in clinical practice. In this paper, we propose a novel curriculum learning-based approach to train deep learning models to handle occluded medical images effectively. Our method progressively introduces occlusion, starting from clear, unobstructed images and gradually moving to images with increasing occlusion levels. This ordered learning process, akin to human learning, allows the model to first grasp simple, discernable patterns and subsequently build upon this knowledge to understand more complicated, occluded scenarios. Furthermore, we present three novel occlusion synthesis methods, namely Wasserstein Curriculum Learning (WCL), Information Adaptive Learning (IAL), and Geodesic Curriculum Learning (GCL). Our extensive experiments on diverse medical image datasets demonstrate substantial improvements in model robustness and diagnostic accuracy over conventional training methodologies.
    DUET: 2D Structured and Approximately Equivariant Representations. (arXiv:2306.16058v2 [cs.LG] UPDATED)
    Multiview Self-Supervised Learning (MSSL) is based on learning invariances with respect to a set of input transformations. However, invariance partially or totally removes transformation-related information from the representations, which might harm performance for specific downstream tasks that require such information. We propose 2D strUctured and EquivarianT representations (coined DUET), which are 2d representations organized in a matrix structure, and equivariant with respect to transformations acting on the input data. DUET representations maintain information about an input transformation, while remaining semantically expressive. Compared to SimCLR (Chen et al., 2020) (unstructured and invariant) and ESSL (Dangovski et al., 2022) (unstructured and equivariant), the structured and equivariant nature of DUET representations enables controlled generation with lower reconstruction error, while controllability is not possible with SimCLR or ESSL. DUET also achieves higher accuracy for several discriminative tasks, and improves transfer learning.
    GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP, and Beyond. (arXiv:2211.01962v4 [cs.LG] UPDATED)
    We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making, which includes Markov decision process (MDP), partially observable Markov decision process (POMDP), and predictive state representation (PSR) as special cases. Toward finding the minimum assumption that empowers sample efficient learning, we propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation in online interactive decision making. In specific, GEC captures the hardness of exploration by comparing the error of predicting the performance of the updated policy with the in-sample training error evaluated on the historical data. We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR, where generalized regular PSR, a new tractable PSR class identified by us, includes nearly all known tractable POMDPs and PSRs. Furthermore, in terms of algorithm design, we propose a generic posterior sampling algorithm, which can be implemented in both model-free and model-based fashion, under both fully observable and partially observable settings. The proposed algorithm modifies the standard posterior sampling algorithm in two aspects: (i) we use an optimistic prior distribution that biases towards hypotheses with higher values and (ii) a loglikelihood function is set to be the empirical loss evaluated on the historical data, where the choice of loss function supports both model-free and model-based learning. We prove that the proposed algorithm is sample efficient by establishing a sublinear regret upper bound in terms of GEC. In summary, we provide a new and unified understanding of both fully observable and partially observable RL.
    Selecting Robust Features for Machine Learning Applications using Multidata Causal Discovery. (arXiv:2304.05294v5 [stat.ML] UPDATED)
    Robust feature selection is vital for creating reliable and interpretable Machine Learning (ML) models. When designing statistical prediction models in cases where domain knowledge is limited and underlying interactions are unknown, choosing the optimal set of features is often difficult. To mitigate this issue, we introduce a Multidata (M) causal feature selection approach that simultaneously processes an ensemble of time series datasets and produces a single set of causal drivers. This approach uses the causal discovery algorithms PC1 or PCMCI that are implemented in the Tigramite Python package. These algorithms utilize conditional independence tests to infer parts of the causal graph. Our causal feature selection approach filters out causally-spurious links before passing the remaining causal features as inputs to ML models (Multiple linear regression, Random Forest) that predict the targets. We apply our framework to the statistical intensity prediction of Western Pacific Tropical Cyclones (TC), for which it is often difficult to accurately choose drivers and their dimensionality reduction (time lags, vertical levels, and area-averaging). Using more stringent significance thresholds in the conditional independence tests helps eliminate spurious causal relationships, thus helping the ML model generalize better to unseen TC cases. M-PC1 with a reduced number of features outperforms M-PCMCI, non-causal ML, and other feature selection methods (lagged correlation, random), even slightly outperforming feature selection based on eXplainable Artificial Intelligence. The optimal causal drivers obtained from our causal feature selection help improve our understanding of underlying relationships and suggest new potential drivers of TC intensification.
    Vision Through the Veil: Differential Privacy in Federated Learning for Medical Image Classification. (arXiv:2306.17794v1 [cs.LG])
    The proliferation of deep learning applications in healthcare calls for data aggregation across various institutions, a practice often associated with significant privacy concerns. This concern intensifies in medical image analysis, where privacy-preserving mechanisms are paramount due to the data being sensitive in nature. Federated learning, which enables cooperative model training without direct data exchange, presents a promising solution. Nevertheless, the inherent vulnerabilities of federated learning necessitate further privacy safeguards. This study addresses this need by integrating differential privacy, a leading privacy-preserving technique, into a federated learning framework for medical image classification. We introduce a novel differentially private federated learning model and meticulously examine its impacts on privacy preservation and model performance. Our research confirms the existence of a trade-off between model accuracy and privacy settings. However, we demonstrate that strategic calibration of the privacy budget in differential privacy can uphold robust image classification performance while providing substantial privacy protection.
    Case-Base Neural Networks: survival analysis with time-varying, higher-order interactions. (arXiv:2301.06535v2 [stat.ML] UPDATED)
    Neural network-based survival methods can model data-driven covariate interactions. While these methods can provide better predictive performance than regression-based approaches, not all can model time-varying interactions and complex baseline hazards. To address this, we propose Case-Base Neural Networks (CBNNs) as a new approach that combines the case-base sampling framework with flexible neural network architectures. Using a novel sampling scheme and data augmentation to naturally account for censoring, we construct a feed-forward neural network that may take time as an input. CBNNs predict the probability of an event occurring at a given moment to estimate the hazard function. We compare the performance of CBNNs to regression and neural network-based survival methods in a simulation and three case studies using two time-dependent metrics. First, we examine performance on a simulation involving a complex baseline hazard and time-varying interactions to assess all methods, with CBNN outperforming competitors. Then, we apply all methods to three real data applications, with CBNNs outperforming the competing models in two studies and showing similar performance in the third. Our results highlight the benefit of combining case-base sampling with deep learning to provide a simple and flexible modeling framework for data-driven, time-varying interaction modeling of single event survival outcomes. An R package is available at https://github.com/Jesse-Islam/cbnn.
    Neural Characteristic Activation Value Analysis for Improved ReLU Network Feature Learning. (arXiv:2305.15912v2 [cs.LG] UPDATED)
    We examine the characteristic activation values of individual ReLU units in neural networks. We refer to the corresponding set for such characteristic activation values in the input space as the characteristic activation set of a ReLU unit. We draw an explicit connection between the characteristic activation set and learned features in ReLU networks. This connection leads to new insights into why various neural network normalization techniques used in modern deep learning architectures regularize and stabilize SGD optimization. Utilizing these insights, we propose a geometric approach to parameterize ReLU networks for improved feature learning. We empirically verify its usefulness with less carefully chosen initialization schemes and larger learning rates. We report improved optimization stability, faster convergence speed, and better generalization performance.
    Off-Policy Evaluation with Out-of-Sample Guarantees. (arXiv:2301.08649v3 [stat.ML] UPDATED)
    We consider the problem of evaluating the performance of a decision policy using past observational data. The outcome of a policy is measured in terms of a loss (aka. disutility or negative reward) and the main problem is making valid inferences about its out-of-sample loss when the past data was observed under a different and possibly unknown policy. Using a sample-splitting method, we show that it is possible to draw such inferences with finite-sample coverage guarantees about the entire loss distribution, rather than just its mean. Importantly, the method takes into account model misspecifications of the past policy - including unmeasured confounding. The evaluation method can be used to certify the performance of a policy using observational data under a specified range of credible model assumptions.
    Averaged Method of Multipliers for Bi-Level Optimization without Lower-Level Strong Convexity. (arXiv:2302.03407v2 [math.OC] UPDATED)
    Gradient methods have become mainstream techniques for Bi-Level Optimization (BLO) in learning fields. The validity of existing works heavily rely on either a restrictive Lower-Level Strong Convexity (LLSC) condition or on solving a series of approximation subproblems with high accuracy or both. In this work, by averaging the upper and lower level objectives, we propose a single loop Bi-level Averaged Method of Multipliers (sl-BAMM) for BLO that is simple yet efficient for large-scale BLO and gets rid of the limited LLSC restriction. We further provide non-asymptotic convergence analysis of sl-BAMM towards KKT stationary points, and the comparative advantage of our analysis lies in the absence of strong gradient boundedness assumption, which is always required by others. Thus our theory safely captures a wider variety of applications in deep learning, especially where the upper-level objective is quadratic w.r.t. the lower-level variable. Experimental results demonstrate the superiority of our method.
    Uncertainty Estimation by Fisher Information-based Evidential Deep Learning. (arXiv:2303.02045v3 [cs.LG] UPDATED)
    Uncertainty estimation is a key factor that makes deep learning reliable in practical applications. Recently proposed evidential neural networks explicitly account for different uncertainties by treating the network's outputs as evidence to parameterize the Dirichlet distribution, and achieve impressive performance in uncertainty estimation. However, for high data uncertainty samples but annotated with the one-hot label, the evidence-learning process for those mislabeled classes is over-penalized and remains hindered. To address this problem, we propose a novel method, Fisher Information-based Evidential Deep Learning ($\mathcal{I}$-EDL). In particular, we introduce Fisher Information Matrix (FIM) to measure the informativeness of evidence carried by each sample, according to which we can dynamically reweight the objective loss terms to make the network more focused on the representation learning of uncertain classes. The generalization ability of our network is further improved by optimizing the PAC-Bayesian bound. As demonstrated empirically, our proposed method consistently outperforms traditional EDL-related algorithms in multiple uncertainty estimation tasks, especially in the more challenging few-shot classification settings.
    WDC Products: A Multi-Dimensional Entity Matching Benchmark. (arXiv:2301.09521v2 [cs.LG] UPDATED)
    The difficulty of an entity matching task depends on a combination of multiple factors such as the amount of corner-case pairs, the fraction of entities in the test set that have not been seen during training, and the size of the development set. Current entity matching benchmarks usually represent single points in the space along such dimensions or they provide for the evaluation of matching methods along a single dimension, for instance the amount of training data. This paper presents WDC Products, an entity matching benchmark which provides for the systematic evaluation of matching systems along combinations of three dimensions while relying on real-world data. The three dimensions are (i) amount of corner-cases (ii) generalization to unseen entities, and (iii) development set size (training set plus validation set). Generalization to unseen entities is a dimension not covered by any of the existing English-language benchmarks yet but is crucial for evaluating the robustness of entity matching systems. Instead of learning how to match entity pairs, entity matching can also be formulated as a multi-class classification task that requires the matcher to recognize individual entities. WDC Products is the first benchmark that provides a pair-wise and a multi-class formulation of the same tasks. We evaluate WDC Products using several state-of-the-art matching systems, including Ditto, HierGAT, and R-SupCon. The evaluation shows that all matching systems struggle with unseen entities to varying degrees. It also shows that for entity matching contrastive learning is more training data efficient compared to cross-encoders.
    ERL-Re$^2$: Efficient Evolutionary Reinforcement Learning with Shared State Representation and Individual Policy Representation. (arXiv:2210.17375v2 [cs.NE] UPDATED)
    Deep Reinforcement Learning (Deep RL) and Evolutionary Algorithms (EA) are two major paradigms of policy optimization with distinct learning principles, i.e., gradient-based v.s. gradient-free. An appealing research direction is integrating Deep RL and EA to devise new methods by fusing their complementary advantages. However, existing works on combining Deep RL and EA have two common drawbacks: 1) the RL agent and EA agents learn their policies individually, neglecting efficient sharing of useful common knowledge; 2) parameter-level policy optimization guarantees no semantic level of behavior evolution for the EA side. In this paper, we propose Evolutionary Reinforcement Learning with Two-scale State Representation and Policy Representation (ERL-Re$^2$), a novel solution to the aforementioned two drawbacks. The key idea of ERL-Re$^2$ is two-scale representation: all EA and RL policies share the same nonlinear state representation while maintaining individual} linear policy representations. The state representation conveys expressive common features of the environment learned by all the agents collectively; the linear policy representation provides a favorable space for efficient policy optimization, where novel behavior-level crossover and mutation operations can be performed. Moreover, the linear policy representation allows convenient generalization of policy fitness with the help of the Policy-extended Value Function Approximator (PeVFA), further improving the sample efficiency of fitness estimation. The experiments on a range of continuous control tasks show that ERL-Re$^2$ consistently outperforms advanced baselines and achieves the State Of The Art (SOTA). Our code is available on https://github.com/yeshenpy/ERL-Re2.
    LTD: Low Temperature Distillation for Robust Adversarial Training. (arXiv:2111.02331v3 [cs.CV] UPDATED)
    Adversarial training has been widely used to enhance the robustness of neural network models against adversarial attacks. Despite the popularity of neural network models, a significant gap exists between the natural and robust accuracy of these models. In this paper, we identify one of the primary reasons for this gap is the common use of one-hot vectors as labels, which hinders the learning process for image recognition. Representing ambiguous images with one-hot vectors is imprecise and may lead the model to suboptimal solutions. To overcome this issue, we propose a novel method called Low Temperature Distillation (LTD) that generates soft labels using the modified knowledge distillation framework. Unlike previous approaches, LTD uses a relatively low temperature in the teacher model and fixed, but different temperatures for the teacher and student models. This modification boosts the model's robustness without encountering the gradient masking problem that has been addressed in defensive distillation. The experimental results demonstrate the effectiveness of the proposed LTD method combined with previous techniques, achieving robust accuracy rates of 58.19%, 31.13%, and 42.08% on CIFAR-10, CIFAR-100, and ImageNet data sets, respectively, without additional unlabeled data.
    Geometric Autoencoders -- What You See is What You Decode. (arXiv:2306.17638v1 [cs.LG])
    Visualization is a crucial step in exploratory data analysis. One possible approach is to train an autoencoder with low-dimensional latent space. Large network depth and width can help unfolding the data. However, such expressive networks can achieve low reconstruction error even when the latent representation is distorted. To avoid such misleading visualizations, we propose first a differential geometric perspective on the decoder, leading to insightful diagnostics for an embedding's distortion, and second a new regularizer mitigating such distortion. Our ``Geometric Autoencoder'' avoids stretching the embedding spuriously, so that the visualization captures the data structure more faithfully. It also flags areas where little distortion could not be achieved, thus guarding against misinterpretation.
    A method to integrate and classify normal distributions. (arXiv:2012.14331v9 [stat.ML] UPDATED)
    Univariate and multivariate normal probability distributions are widely used when modeling decisions under uncertainty. Computing the performance of such models requires integrating these distributions over specific domains, which can vary widely across models. Besides some special cases, there exist no general analytical expressions, standard numerical methods or software for these integrals. Here we present mathematical results and open-source software that provide (i) the probability in any domain of a normal in any dimensions with any parameters, (ii) the probability density, cumulative distribution, and inverse cumulative distribution of any function of a normal vector, (iii) the classification errors among any number of normal distributions, the Bayes-optimal discriminability index and relation to the operating characteristic, (iv) dimension reduction and visualizations for such problems, and (v) tests for how reliably these methods may be used on given data. We demonstrate these tools with vision research applications of detecting occluding objects in natural scenes, and detecting camouflage.
    A Gradient Smoothed Functional Algorithm with Truncated Cauchy Random Perturbations for Stochastic Optimization. (arXiv:2208.00290v4 [math.OC] UPDATED)
    In this paper, we present a stochastic gradient algorithm for minimizing a smooth objective function that is an expectation over noisy cost samples, and only the latter are observed for any given parameter. Our algorithm employs a gradient estimation scheme with random perturbations, which are formed using the truncated Cauchy distribution from the delta sphere. We analyze the bias and variance of the proposed gradient estimator. Our algorithm is found to be particularly useful in the case when the objective function is non-convex, and the parameter dimension is high. From an asymptotic convergence analysis, we establish that our algorithm converges almost surely to the set of stationary points of the objective function and obtains the asymptotic convergence rate. We also show that our algorithm avoids unstable equilibria, implying convergence to local minima. Further, we perform a non-asymptotic convergence analysis of our algorithm. In particular, we establish here a non-asymptotic bound for finding an epsilon-stationary point of the non-convex objective function. Finally, we demonstrate numerically through simulations that the performance of our algorithm outperforms GSF, SPSA, and RDSA by a significant margin over a few non-convex settings and further validate its performance over convex (noisy) objectives.
    Understanding Unfairness via Training Concept Influence. (arXiv:2306.17828v1 [cs.LG])
    Knowing the causes of a model's unfairness helps practitioners better understand their data and algorithms. This is an important yet relatively unexplored task. We look into this problem through the lens of the training data - one of the major sources of unfairness. We ask the following questions: how would a model's fairness performance change if, in its training data, some samples (1) were collected from a different (e.g. demographic) group, (2) were labeled differently, or (3) some features were changed? In other words, we quantify the fairness influence of training samples by counterfactually intervening and changing samples based on predefined concepts, i.e. data attributes such as features (X), labels (Y), or sensitive attributes (A). To calculate a training sample's influence on the model's unfairness w.r.t a concept, we first generate counterfactual samples based on the concept, i.e. the counterfactual versions of the sample if the concept were changed. We then calculate the resulting impact on the unfairness, via influence function, if the counterfactual samples were used in training. Our framework not only helps practitioners understand the observed unfairness and repair their training data, but also leads to many other applications, e.g. detecting mislabeling, fixing imbalanced representations, and detecting fairness-targeted poisoning attacks.
    TD Convergence: An Optimization Perspective. (arXiv:2306.17750v1 [cs.LG])
    We study the convergence behavior of the celebrated temporal-difference (TD) learning algorithm. By looking at the algorithm through the lens of optimization, we first argue that TD can be viewed as an iterative optimization algorithm where the function to be minimized changes per iteration. By carefully investigating the divergence displayed by TD on a classical counter example, we identify two forces that determine the convergent or divergent behavior of the algorithm. We next formalize our discovery in the linear TD setting with quadratic loss and prove that convergence of TD hinges on the interplay between these two forces. We extend this optimization perspective to prove convergence of TD in a much broader setting than just linear approximation and squared loss. Our results provide a theoretical explanation for the successful application of TD in reinforcement learning.
    Interactive Volume Visualization via Multi-Resolution Hash Encoding based Neural Representation. (arXiv:2207.11620v3 [cs.GR] UPDATED)
    Neural networks have shown great potential in compressing volume data for visualization. However, due to the high cost of training and inference, such volumetric neural representations have thus far only been applied to offline data processing and non-interactive rendering. In this paper, we demonstrate that by simultaneously leveraging modern GPU tensor cores, a native CUDA neural network framework, and a well-designed rendering algorithm with macro-cell acceleration, we can interactively ray trace volumetric neural representations (10-60fps). Our neural representations are also high-fidelity (PSNR > 30dB) and compact (10-1000x smaller). Additionally, we show that it is possible to fit the entire training step inside a rendering loop and skip the pre-training process completely. To support extreme-scale volume data, we also develop an efficient out-of-core training strategy, which allows our volumetric neural representation training to potentially scale up to terascale using only an NVIDIA RTX 3090 workstation.
    Generalized Time Warping Invariant Dictionary Learning for Time Series Classification and Clustering. (arXiv:2306.17690v1 [stat.ML])
    Dictionary learning is an effective tool for pattern recognition and classification of time series data. Among various dictionary learning techniques, the dynamic time warping (DTW) is commonly used for dealing with temporal delays, scaling, transformation, and many other kinds of temporal misalignments issues. However, the DTW suffers overfitting or information loss due to its discrete nature in aligning time series data. To address this issue, we propose a generalized time warping invariant dictionary learning algorithm in this paper. Our approach features a generalized time warping operator, which consists of linear combinations of continuous basis functions for facilitating continuous temporal warping. The integration of the proposed operator and the dictionary learning is formulated as an optimization problem, where the block coordinate descent method is employed to jointly optimize warping paths, dictionaries, and sparseness coefficients. The optimized results are then used as hyperspace distance measures to feed classification and clustering algorithms. The superiority of the proposed method in terms of dictionary learning, classification, and clustering is validated through ten sets of public datasets in comparing with various benchmark methods.
    Approximating Probability Distributions by using Wasserstein Generative Adversarial Networks. (arXiv:2103.10060v4 [cs.LG] UPDATED)
    Studied here are Wasserstein generative adversarial networks (WGANs) with GroupSort neural networks as their discriminators. It is shown that the error bound of the approximation for the target distribution depends on the width and depth (capacity) of the generators and discriminators and the number of samples in training. A quantified generalization bound is established for the Wasserstein distance between the generated and target distributions. According to the theoretical results, WGANs have a higher requirement for the capacity of discriminators than that of generators, which is consistent with some existing results. More importantly, the results with overly deep and wide (high-capacity) generators may be worse than those with low-capacity generators if discriminators are insufficiently strong. Numerical results obtained using Swiss roll and MNIST datasets confirm the theoretical results.
    Structure-based Drug Design with Equivariant Diffusion Models. (arXiv:2210.13695v2 [q-bio.BM] UPDATED)
    Structure-based drug design (SBDD) aims to design small-molecule ligands that bind with high affinity and specificity to pre-determined protein targets. In this paper, we formulate SBDD as a 3D-conditional generation problem and present DiffSBDD, an SE(3)-equivariant 3D-conditional diffusion model that generates novel ligands conditioned on protein pockets. Comprehensive in silico experiments demonstrate the efficiency and effectiveness of DiffSBDD in generating novel and diverse drug-like ligands with competitive docking scores. We further explore the flexibility of the diffusion framework for a broader range of tasks in drug design campaigns, such as off-the-shelf property optimization and partial molecular design with inpainting.
    Accuracy Boosters: Epoch-Driven Mixed-Mantissa Block Floating-Point for DNN Training. (arXiv:2211.10737v3 [cs.LG] UPDATED)
    The unprecedented growth in DNN model complexity, size, and amount of training data has led to a commensurate increase in demand for computing and a search for minimal encoding. Recent research advocates Hybrid Block Floating Point (HBFP) to minimize silicon provisioning in accelerators by converting the majority of arithmetic operations in training to 8-bit fixed point. In this paper, we perform a full-scale exploration of the HBFP design space using mathematical tools to study the interplay among various parameters and identify opportunities for even smaller encodings across layers and epochs. Based on our findings, we propose Accuracy Boosters, an epoch-driven mixed-mantissa HBFP technique that uses 6-bit mantissas only in the last epoch and first/last layers, and 4-bit mantissas for $99.7\%$ of all other arithmetic operations in training. Using analytic models, we show Accuracy Boosters enable increasing arithmetic density for an HBFP training accelerator by up to $21.3\times$ compared to FP32 and up to $4.4\times$ compared to another SOTA format Bfloat16, while preserving or outperforming FP32 accuracy.
    ChatGPT for Robotics: Design Principles and Model Abilities. (arXiv:2306.17582v1 [cs.AI])
    This paper presents an experimental study regarding the use of OpenAI's ChatGPT for robotics applications. We outline a strategy that combines design principles for prompt engineering and the creation of a high-level function library which allows ChatGPT to adapt to different robotics tasks, simulators, and form factors. We focus our evaluations on the effectiveness of different prompt engineering techniques and dialog strategies towards the execution of various types of robotics tasks. We explore ChatGPT's ability to use free-form dialog, parse XML tags, and to synthesize code, in addition to the use of task-specific prompting functions and closed-loop reasoning through dialogues. Our study encompasses a range of tasks within the robotics domain, from basic logical, geometrical, and mathematical reasoning all the way to complex domains such as aerial navigation, manipulation, and embodied agents. We show that ChatGPT can be effective at solving several of such tasks, while allowing users to interact with it primarily via natural language instructions. In addition to these studies, we introduce an open-sourced research tool called PromptCraft, which contains a platform where researchers can collaboratively upload and vote on examples of good prompting schemes for robotics applications, as well as a sample robotics simulator with ChatGPT integration, making it easier for users to get started with using ChatGPT for robotics.
    Sequential Recommendation Model for Next Purchase Prediction. (arXiv:2207.06225v2 [cs.IR] UPDATED)
    Timeliness and contextual accuracy of recommendations are increasingly important when delivering contemporary digital marketing experiences. Conventional recommender systems (RS) suggest relevant but time-invariant items to users by accounting for their past purchases. These recommendations only map to customers' general preferences rather than a customer's specific needs immediately preceding a purchase. In contrast, RSs that consider the order of transactions, purchases, or experiences to measure evolving preferences can offer more salient and effective recommendations to customers: Sequential RSs not only benefit from a better behavioral understanding of a user's current needs but also better predictive power. In this paper, we demonstrate and rank the effectiveness of a sequential recommendation system by utilizing a production dataset of over 2.7 million credit card transactions for 46K cardholders. The method first employs an autoencoder on raw transaction data and submits observed transaction encodings to a GRU-based sequential model. The sequential model produces a MAP@1 metric of 47% on the out-of-sample test set, in line with existing research. We also discuss implications for embedding real-time predictions using the sequential RS into Nexus, a scalable, low-latency, event-based digital experience architecture.
    Improving Expert Predictions with Conformal Prediction. (arXiv:2201.12006v5 [cs.LG] UPDATED)
    Automated decision support systems promise to help human experts solve multiclass classification tasks more efficiently and accurately. However, existing systems typically require experts to understand when to cede agency to the system or when to exercise their own agency. Otherwise, the experts may be better off solving the classification tasks on their own. In this work, we develop an automated decision support system that, by design, does not require experts to understand when to trust the system to improve performance. Rather than providing (single) label predictions and letting experts decide when to trust these predictions, our system provides sets of label predictions constructed using conformal prediction$\unicode{x2014}$prediction sets$\unicode{x2014}$and forcefully asks experts to predict labels from these sets. By using conformal prediction, our system can precisely trade-off the probability that the true label is not in the prediction set, which determines how frequently our system will mislead the experts, and the size of the prediction set, which determines the difficulty of the classification task the experts need to solve using our system. In addition, we develop an efficient and near-optimal search method to find the conformal predictor under which the experts benefit the most from using our system. Simulation experiments using synthetic and real expert predictions demonstrate that our system may help experts make more accurate predictions and is robust to the accuracy of the classifier the conformal predictor relies on.
    Longitudinal Multimodal Transformer Integrating Imaging and Latent Clinical Signatures From Routine EHRs for Pulmonary Nodule Classification. (arXiv:2304.02836v5 [eess.IV] UPDATED)
    The accuracy of predictive models for solitary pulmonary nodule (SPN) diagnosis can be greatly increased by incorporating repeat imaging and medical context, such as electronic health records (EHRs). However, clinically routine modalities such as imaging and diagnostic codes can be asynchronous and irregularly sampled over different time scales which are obstacles to longitudinal multimodal learning. In this work, we propose a transformer-based multimodal strategy to integrate repeat imaging with longitudinal clinical signatures from routinely collected EHRs for SPN classification. We perform unsupervised disentanglement of latent clinical signatures and leverage time-distance scaled self-attention to jointly learn from clinical signatures expressions and chest computed tomography (CT) scans. Our classifier is pretrained on 2,668 scans from a public dataset and 1,149 subjects with longitudinal chest CTs, billing codes, medications, and laboratory tests from EHRs of our home institution. Evaluation on 227 subjects with challenging SPNs revealed a significant AUC improvement over a longitudinal multimodal baseline (0.824 vs 0.752 AUC), as well as improvements over a single cross-section multimodal scenario (0.809 AUC) and a longitudinal imaging-only scenario (0.741 AUC). This work demonstrates significant advantages with a novel approach for co-learning longitudinal imaging and non-imaging phenotypes with transformers. Code available at https://github.com/MASILab/lmsignatures.
    Robust Implicit Regularization via Weight Normalization. (arXiv:2305.05448v2 [cs.LG] UPDATED)
    Overparameterized models may have many interpolating solutions; implicit regularization refers to the hidden preference of a particular optimization method towards a certain interpolating solution among the many. A by now established line of work has shown that (stochastic) gradient descent tends to have an implicit bias towards low rank and/or sparse solutions when used to train deep linear networks, explaining to some extent why overparameterized neural network models trained by gradient descent tend to have good generalization performance in practice. However, existing theory for square-loss objectives often requires very small initialization of the trainable weights, which is at odds with the larger scale at which weights are initialized in practice for faster convergence and better generalization performance. In this paper, we aim to close this gap by incorporating and analyzing gradient descent with weight normalization, where the weight vector is reparamterized in terms of polar coordinates, and gradient descent is applied to the polar coordinates. By analyzing key invariants of the gradient flow and using Lojasiewicz's Theorem, we show that weight normalization also has an implicit bias towards sparse solutions in the diagonal linear model, but that in contrast to plain gradient descent, weight normalization enables a robust bias that persists even if the weights are initialized at practically large scale. Experiments suggest that the gains in both convergence speed and robustness of the implicit bias are improved dramatically by using weight normalization in overparameterized diagonal linear network models.
    Towards Complex Dynamic Physics System Simulation with Graph Neural ODEs. (arXiv:2305.12334v4 [cs.LG] UPDATED)
    The great learning ability of deep learning models facilitates us to comprehend the real physical world, making learning to simulate complicated particle systems a promising endeavour. However, the complex laws of the physical world pose significant challenges to the learning based simulations, such as the varying spatial dependencies between interacting particles and varying temporal dependencies between particle system states in different time stamps, which dominate particles' interacting behaviour and the physical systems' evolution patterns. Existing learning based simulation methods fail to fully account for the complexities, making them unable to yield satisfactory simulations. To better comprehend the complex physical laws, this paper proposes a novel learning based simulation model- Graph Networks with Spatial-Temporal neural Ordinary Equations (GNSTODE)- that characterizes the varying spatial and temporal dependencies in particle systems using a united end-to-end framework. Through training with real-world particle-particle interaction observations, GNSTODE is able to simulate any possible particle systems with high precisions. We empirically evaluate GNSTODE's simulation performance on two real-world particle systems, Gravity and Coulomb, with varying levels of spatial and temporal dependencies. The results show that the proposed GNSTODE yields significantly better simulations than state-of-the-art learning based simulation methods, which proves that GNSTODE can serve as an effective solution to particle simulations in real-world application.
    Graphtester: Exploring Theoretical Boundaries of GNNs on Graph Datasets. (arXiv:2306.17482v1 [cs.LG])
    Graph Neural Networks (GNNs) have emerged as a powerful tool for learning from graph-structured data. However, even state-of-the-art architectures have limitations on what structures they can distinguish, imposing theoretical limits on what the networks can achieve on different datasets. In this paper, we provide a new tool called Graphtester for a comprehensive analysis of the theoretical capabilities of GNNs for various datasets, tasks, and scores. We use Graphtester to analyze over 40 different graph datasets, determining upper bounds on the performance of various GNNs based on the number of layers. Further, we show that the tool can also be used for Graph Transformers using positional node encodings, thereby expanding its scope. Finally, we demonstrate that features generated by Graphtester can be used for practical applications such as Graph Transformers, and provide a synthetic dataset to benchmark node and edge features, such as positional encodings. The package is freely available at the following URL: https://github.com/meakbiyik/graphtester.
    Look, Remember and Reason: Visual Reasoning with Grounded Rationales. (arXiv:2306.17778v1 [cs.CV])
    Large language models have recently shown human level performance on a variety of reasoning tasks. However, the ability of these models to perform complex visual reasoning has not been studied in detail yet. A key challenge in many visual reasoning tasks is that the visual information needs to be tightly integrated in the reasoning process. We propose to address this challenge by drawing inspiration from human visual problem solving which depends on a variety of low-level visual capabilities. It can often be cast as the three step-process of ``Look, Remember, Reason'': visual information is incrementally extracted using low-level visual routines in a step-by-step fashion until a final answer is reached. We follow the same paradigm to enable existing large language models, with minimal changes to the architecture, to solve visual reasoning problems. To this end, we introduce rationales over the visual input that allow us to integrate low-level visual capabilities, such as object recognition and tracking, as surrogate tasks. We show competitive performance on diverse visual reasoning tasks from the CLEVR, CATER, and ACRE datasets over state-of-the-art models designed specifically for these tasks.
    Private Online Prediction from Experts: Separations and Faster Rates. (arXiv:2210.13537v3 [cs.LG] UPDATED)
    Online prediction from experts is a fundamental problem in machine learning and several works have studied this problem under privacy constraints. We propose and analyze new algorithms for this problem that improve over the regret bounds of the best existing algorithms for non-adaptive adversaries. For approximate differential privacy, our algorithms achieve regret bounds of $\tilde{O}(\sqrt{T \log d} + \log d/\varepsilon)$ for the stochastic setting and $\tilde{O}(\sqrt{T \log d} + T^{1/3} \log d/\varepsilon)$ for oblivious adversaries (where $d$ is the number of experts). For pure DP, our algorithms are the first to obtain sub-linear regret for oblivious adversaries in the high-dimensional regime $d \ge T$. Moreover, we prove new lower bounds for adaptive adversaries. Our results imply that unlike the non-private setting, there is a strong separation between the optimal regret for adaptive and non-adaptive adversaries for this problem. Our lower bounds also show a separation between pure and approximate differential privacy for adaptive adversaries where the latter is necessary to achieve the non-private $O(\sqrt{T})$ regret.
    Why Deep Models Often cannot Beat Non-deep Counterparts on Molecular Property Prediction?. (arXiv:2306.17702v1 [cs.LG])
    Molecular property prediction (MPP) is a crucial task in the drug discovery pipeline, which has recently gained considerable attention thanks to advances in deep neural networks. However, recent research has revealed that deep models struggle to beat traditional non-deep ones on MPP. In this study, we benchmark 12 representative models (3 non-deep models and 9 deep models) on 14 molecule datasets. Through the most comprehensive study to date, we make the following key observations: \textbf{(\romannumeral 1)} Deep models are generally unable to outperform non-deep ones; \textbf{(\romannumeral 2)} The failure of deep models on MPP cannot be solely attributed to the small size of molecular datasets. What matters is the irregular molecule data pattern; \textbf{(\romannumeral 3)} In particular, tree models using molecular fingerprints as inputs tend to perform better than other competitors. Furthermore, we conduct extensive empirical investigations into the unique patterns of molecule data and inductive biases of various models underlying these phenomena.
    Enhancing training of physics-informed neural networks using domain-decomposition based preconditioning strategies. (arXiv:2306.17648v1 [math.NA])
    We propose to enhance the training of physics-informed neural networks (PINNs). To this aim, we introduce nonlinear additive and multiplicative preconditioning strategies for the widely used L-BFGS optimizer. The nonlinear preconditioners are constructed by utilizing the Schwarz domain-decomposition framework, where the parameters of the network are decomposed in a layer-wise manner. Through a series of numerical experiments, we demonstrate that both, additive and multiplicative preconditioners significantly improve the convergence of the standard L-BFGS optimizer, while providing more accurate solutions of the underlying partial differential equations. Moreover, the additive preconditioner is inherently parallel, thus giving rise to a novel approach to model parallelism.
    Bounded (O(1)) Regret Recommendation Learning via Synthetic Controls Oracle. (arXiv:2301.12571v2 [cs.LG] UPDATED)
    In online exploration systems where users with fixed preferences repeatedly arrive, it has recently been shown that O(1), i.e., bounded regret, can be achieved when the system is modeled as a linear contextual bandit. This result may be of interest for recommender systems, where the popularity of their items is often short-lived, as the exploration itself may be completed quickly before potential long-run non-stationarities come into play. However, in practice, exact knowledge of the linear model is difficult to justify. Furthermore, potential existence of unobservable covariates, uneven user arrival rates, interpretation of the necessary rank condition, and users opting out of private data tracking all need to be addressed for practical recommender system applications. In this work, we conduct a theoretical study to address all these issues while still achieving bounded regret. Aside from proof techniques, the key differentiating assumption we make here is the presence of effective Synthetic Control Methods (SCM), which are shown to be a practical relaxation of the exact linear model knowledge assumption. We verify our theoretical bounded regret result using a minimal simulation experiment.
    Impact of Noise on Calibration and Generalisation of Neural Networks. (arXiv:2306.17630v1 [cs.LG])
    Noise injection and data augmentation strategies have been effective for enhancing the generalisation and robustness of neural networks (NNs). Certain types of noise such as label smoothing and MixUp have also been shown to improve calibration. Since noise can be added in various stages of the NN's training, it motivates the question of when and where the noise is the most effective. We study a variety of noise types to determine how much they improve calibration and generalisation, and under what conditions. More specifically we evaluate various noise-injection strategies in both in-distribution (ID) and out-of-distribution (OOD) scenarios. The findings highlight that activation noise was the most transferable and effective in improving generalisation, while input augmentation noise was prominent in improving calibration on OOD but not necessarily ID data.
    Achieving RGB-D level Segmentation Performance from a Single ToF Camera. (arXiv:2306.17636v1 [cs.CV])
    Depth is a very important modality in computer vision, typically used as complementary information to RGB, provided by RGB-D cameras. In this work, we show that it is possible to obtain the same level of accuracy as RGB-D cameras on a semantic segmentation task using infrared (IR) and depth images from a single Time-of-Flight (ToF) camera. In order to fuse the IR and depth modalities of the ToF camera, we introduce a method utilizing depth-specific convolutions in a multi-task learning framework. In our evaluation on an in-car segmentation dataset, we demonstrate the competitiveness of our method against the more costly RGB-D approaches.
    Stay on topic with Classifier-Free Guidance. (arXiv:2306.17806v1 [cs.CL])
    Classifier-Free Guidance (CFG) has recently emerged in text-to-image generation as a lightweight technique to encourage prompt-adherence in generations. In this work, we demonstrate that CFG can be used broadly as an inference-time technique in pure language modeling. We show that CFG (1) improves the performance of Pythia, GPT-2 and LLaMA-family models across an array of tasks: Q\&A, reasoning, code generation, and machine translation, achieving SOTA on LAMBADA with LLaMA-7B over PaLM-540B; (2) brings improvements equivalent to a model with twice the parameter-count; (3) can stack alongside other inference-time methods like Chain-of-Thought and Self-Consistency, yielding further improvements in difficult tasks; (4) can be used to increase the faithfulness and coherence of assistants in challenging form-driven and content-driven prompts: in a human evaluation we show a 75\% preference for GPT4All using CFG over baseline.
    High Dimensional Data Enrichment: Interpretable, Fast, and Data-Efficient. (arXiv:1806.04047v4 [stat.ML] UPDATED)
    We consider the problem of multi-task learning in the high dimensional setting. In particular, we introduce an estimator and investigate its statistical and computational properties for the problem of multiple connected linear regressions known as Data Enrichment/Sharing. The between-tasks connections are captured by a cross-tasks \emph{common parameter}, which gets refined by per-task \emph{individual parameters}. Any convex function, e.g., norm, can characterize the structure of both common and individual parameters. We delineate the sample complexity of our estimator and provide a high probability non-asymptotic bound for estimation error of all parameters under a geometric condition. We show that the recovery of the common parameter benefits from \emph{all} of the pooled samples. We propose an iterative estimation algorithm with a geometric convergence rate and supplement our theoretical analysis with experiments on synthetic data. Overall, we present a first thorough statistical and computational analysis of inference in the data-sharing model.
    Physics-informed invertible neural network for the Koopman operator learning. (arXiv:2306.17396v1 [math.NA])
    In Koopman operator theory, a finite-dimensional nonlinear system is transformed into an infinite but linear system using a set of observable functions. However, manually selecting observable functions that span the invariant subspace of the Koopman operator based on prior knowledge is inefficient and challenging, particularly when little or no information is available about the underlying systems. Furthermore, current methodologies tend to disregard the importance of the invertibility of observable functions, which leads to inaccurate results. To address these challenges, we propose the so-called FlowDMD, a Flow-based Dynamic Mode Decomposition that utilizes the Coupling Flow Invertible Neural Network (CF-INN) framework. FlowDMD leverages the intrinsically invertible characteristics of the CF-INN to learn the invariant subspaces of the Koopman operator and accurately reconstruct state variables. Numerical experiments demonstrate the superior performance of our algorithm compared to state-of-the-art methodologies.
    The Implicit Bias of Minima Stability in Multivariate Shallow ReLU Networks. (arXiv:2306.17499v1 [cs.LG])
    We study the type of solutions to which stochastic gradient descent converges when used to train a single hidden-layer multivariate ReLU network with the quadratic loss. Our results are based on a dynamical stability analysis. In the univariate case, it was shown that linearly stable minima correspond to network functions (predictors), whose second derivative has a bounded weighted $L^1$ norm. Notably, the bound gets smaller as the step size increases, implying that training with a large step size leads to `smoother' predictors. Here we generalize this result to the multivariate case, showing that a similar result applies to the Laplacian of the predictor. We demonstrate the tightness of our bound on the MNIST dataset, and show that it accurately captures the behavior of the solutions as a function of the step size. Additionally, we prove a depth separation result on the approximation power of ReLU networks corresponding to stable minima of the loss. Specifically, although shallow ReLU networks are universal approximators, we prove that stable shallow networks are not. Namely, there is a function that cannot be well-approximated by stable single hidden-layer ReLU networks trained with a non-vanishing step size. This is while the same function can be realized as a stable two hidden-layer ReLU network. Finally, we prove that if a function is sufficiently smooth (in a Sobolev sense) then it can be approximated arbitrarily well using shallow ReLU networks that correspond to stable solutions of gradient descent.
    Beyond Neural-on-Neural Approaches to Speaker Gender Protection. (arXiv:2306.17700v1 [eess.AS])
    Recent research has proposed approaches that modify speech to defend against gender inference attacks. The goal of these protection algorithms is to control the availability of information about a speaker's gender, a privacy-sensitive attribute. Currently, the common practice for developing and testing gender protection algorithms is "neural-on-neural", i.e., perturbations are generated and tested with a neural network. In this paper, we propose to go beyond this practice to strengthen the study of gender protection. First, we demonstrate the importance of testing gender inference attacks that are based on speech features historically developed by speech scientists, alongside the conventionally used neural classifiers. Next, we argue that researchers should use speech features to gain insight into how protective modifications change the speech signal. Finally, we point out that gender-protection algorithms should be compared with novel "vocal adversaries", human-executed voice adaptations, in order to improve interpretability and enable before-the-mic protection.
    Scaling Model Checking for DNN Analysis via State-Space Reduction and Input Segmentation (Extended Version). (arXiv:2306.17323v1 [cs.LG])
    Owing to their remarkable learning capabilities and performance in real-world applications, the use of machine learning systems based on Neural Networks (NNs) has been continuously increasing. However, various case studies and empirical findings in the literature suggest that slight variations to NN inputs can lead to erroneous and undesirable NN behavior. This has led to considerable interest in their formal analysis, aiming to provide guarantees regarding a given NN's behavior. Existing frameworks provide robustness and/or safety guarantees for the trained NNs, using satisfiability solving and linear programming. We proposed FANNet, the first model checking-based framework for analyzing a broader range of NN properties. However, the state-space explosion associated with model checking entails a scalability problem, making the FANNet applicable only to small NNs. This work develops state-space reduction and input segmentation approaches, to improve the scalability and timing efficiency of formal NN analysis. Compared to the state-of-the-art FANNet, this enables our new model checking-based framework to reduce the verification's timing overhead by a factor of up to 8000, making the framework applicable to NNs even with approximately $80$ times more network parameters. This in turn allows the analysis of NN safety properties using the new framework, in addition to all the NN properties already included with FANNet. The framework is shown to be efficiently able to analyze properties of NNs trained on healthcare datasets as well as the well--acknowledged ACAS Xu NNs.
    Class-Incremental Learning using Diffusion Model for Distillation and Replay. (arXiv:2306.17560v1 [cs.LG])
    Class-incremental learning aims to learn new classes in an incremental fashion without forgetting the previously learned ones. Several research works have shown how additional data can be used by incremental models to help mitigate catastrophic forgetting. In this work, following the recent breakthrough in text-to-image generative models and their wide distribution, we propose the use of a pretrained Stable Diffusion model as a source of additional data for class-incremental learning. Compared to competitive methods that rely on external, often unlabeled, datasets of real images, our approach can generate synthetic samples belonging to the same classes as the previously encountered images. This allows us to use those additional data samples not only in the distillation loss but also for replay in the classification loss. Experiments on the competitive benchmarks CIFAR100, ImageNet-Subset, and ImageNet demonstrate how this new approach can be used to further improve the performance of state-of-the-art methods for class-incremental learning on large scale datasets.
    Augmenting Holistic Review in University Admission using Natural Language Processing for Essays and Recommendation Letters. (arXiv:2306.17575v1 [cs.CL])
    University admission at many highly selective institutions uses a holistic review process, where all aspects of the application, including protected attributes (e.g., race, gender), grades, essays, and recommendation letters are considered, to compose an excellent and diverse class. In this study, we empirically evaluate how influential protected attributes are for predicting admission decisions using a machine learning (ML) model, and in how far textual information (e.g., personal essay, teacher recommendation) may substitute for the loss of protected attributes in the model. Using data from 14,915 applicants to an undergraduate admission office at a selective U.S. institution in the 2022-2023 cycle, we find that the exclusion of protected attributes from the ML model leads to substantially reduced admission-prediction performance. The inclusion of textual information via both a TF-IDF representation and a Latent Dirichlet allocation (LDA) model partially restores model performance, but does not appear to provide a full substitute for admitting a similarly diverse class. In particular, while the text helps with gender diversity, the proportion of URM applicants is severely impacted by the exclusion of protected attributes, and the inclusion of new attributes generated from the textual information does not recover this performance loss.
    Lifelong Learning for Neural powered Mixed Integer Programming. (arXiv:2208.12226v3 [math.OC] UPDATED)
    Mixed Integer programs (MIPs) are typically solved by the Branch-and-Bound algorithm. Recently, Learning to imitate fast approximations of the expert strong branching heuristic has gained attention due to its success in reducing the running time for solving MIPs. However, existing learning-to-branch methods assume that the entire training data is available in a single session of training. This assumption is often not true, and if the training data is supplied in continual fashion over time, existing techniques suffer from catastrophic forgetting. In this work, we study the hitherto unexplored paradigm of Lifelong Learning to Branch on Mixed Integer Programs. To mitigate catastrophic forgetting, we propose LIMIP, which is powered by the idea of modeling an MIP instance in the form of a bipartite graph, which we map to an embedding space using a bipartite Graph Attention Network. This rich embedding space avoids catastrophic forgetting through the application of knowledge distillation and elastic weight consolidation, wherein we learn the parameters key towards retaining efficacy and are therefore protected from significant drift. We evaluate LIMIP on a series of NP-hard problems and establish that in comparison to existing baselines, LIMIP is up to 50% better when confronted with lifelong learning.
    Hierarchical Bayesian Regression for Multi-Location Sales Transaction Forecasting. (arXiv:2306.17795v1 [stat.AP])
    The features in many prediction models naturally take the form of a hierarchy. The lower levels represent individuals or events. These units group naturally into locations and intervals or other aggregates, often at multiple levels. Levels of groupings may intersect and join, much as relational database tables do. Besides representing the structure of the data, predictive features in hierarchical models can be assigned to their proper levels. Such models lend themselves to hierarchical Bayes solution methods that ``share'' results of inference between groups by generalizing over the case of individual models for each group versus one model that aggregates all groups into one. In this paper we show our work-in-progress applying a hierarchical Bayesian model to forecast purchases throughout the day at store franchises, with groupings over locations and days of the week. We demonstrate using the \textsf{stan} package on individual sales transaction data collected over the course of a year. We show how this solves the dilemma of having limited data and hence modest accuracy for each day and location, while being able to scale to a large number of locations with improved accuracy.
    Why Shallow Networks Struggle with Approximating and Learning High Frequency: A Numerical Study. (arXiv:2306.17301v1 [cs.LG])
    In this work, a comprehensive numerical study involving analysis and experiments shows why a two-layer neural network has difficulties handling high frequencies in approximation and learning when machine precision and computation cost are important factors in real practice. In particular, the following fundamental computational issues are investigated: (1) the best accuracy one can achieve given a finite machine precision, (2) the computation cost to achieve a given accuracy, and (3) stability with respect to perturbations. The key to the study is the spectral analysis of the corresponding Gram matrix of the activation functions which also shows how the properties of the activation function play a role in the picture.
    The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks. (arXiv:2306.17844v1 [cs.LG])
    Do neural networks, trained on well-understood algorithmic tasks, reliably rediscover known algorithms for solving those tasks? Several recent studies, on tasks ranging from group arithmetic to in-context linear regression, have suggested that the answer is yes. Using modular addition as a prototypical problem, we show that algorithm discovery in neural networks is sometimes more complex. Small changes to model hyperparameters and initializations can induce the discovery of qualitatively different algorithms from a fixed training set, and even parallel implementations of multiple such algorithms. Some networks trained to perform modular addition implement a familiar Clock algorithm; others implement a previously undescribed, less intuitive, but comprehensible procedure which we term the Pizza algorithm, or a variety of even more complex procedures. Our results show that even simple learning problems can admit a surprising diversity of solutions, motivating the development of new tools for characterizing the behavior of neural networks across their algorithmic phase space.
    Efficient uniform approximation using Random Vector Functional Link networks. (arXiv:2306.17501v1 [stat.ML])
    A Random Vector Functional Link (RVFL) network is a depth-2 neural network with random inner weights and biases. As only the outer weights of such architectures need to be learned, the learning process boils down to a linear optimization task, allowing one to sidestep the pitfalls of nonconvex optimization problems. In this paper, we prove that an RVFL with ReLU activation functions can approximate Lipschitz continuous functions provided its hidden layer is exponentially wide in the input dimension. Although it has been established before that such approximation can be achieved in $L_2$ sense, we prove it for $L_\infty$ approximation error and Gaussian inner weights. To the best of our knowledge, our result is the first of this kind. We give a nonasymptotic lower bound for the number of hidden layer nodes, depending on, among other things, the Lipschitz constant of the target function, the desired accuracy, and the input dimension. Our method of proof is rooted in probability theory and harmonic analysis.
    A Survey on Blockchain-Based Federated Learning and Data Privacy. (arXiv:2306.17338v1 [cs.LG])
    Federated learning is a decentralized machine learning paradigm that allows multiple clients to collaborate by leveraging local computational power and the models transmission. This method reduces the costs and privacy concerns associated with centralized machine learning methods while ensuring data privacy by distributing training data across heterogeneous devices. On the other hand, federated learning has the drawback of data leakage due to the lack of privacy-preserving mechanisms employed during storage, transfer, and sharing, thus posing significant risks to data owners and suppliers. Blockchain technology has emerged as a promising technology for offering secure data-sharing platforms in federated learning, especially in Industrial Internet of Things (IIoT) settings. This survey aims to compare the performance and security of various data privacy mechanisms adopted in blockchain-based federated learning architectures. We conduct a systematic review of existing literature on secure data-sharing platforms for federated learning provided by blockchain technology, providing an in-depth overview of blockchain-based federated learning, its essential components, and discussing its principles, and potential applications. The primary contribution of this survey paper is to identify critical research questions and propose potential directions for future research in blockchain-based federated learning.
    Variational principle to regularize machine-learned density functionals: the non-interacting kinetic-energy functional. (arXiv:2306.17587v1 [physics.chem-ph])
    Practical density functional theory (DFT) owes its success to the groundbreaking work of Kohn and Sham that introduced the exact calculation of the non-interacting kinetic energy of the electrons using an auxiliary mean-field system. However, the full power of DFT will not be unleashed until the exact relationship between the electron density and the non-interacting kinetic energy is found. Various attempts have been made to approximate this functional, similar to the exchange--correlation functional, with much less success due to the larger contribution of kinetic energy and its more non-local nature. In this work we propose a new and efficient regularization method to train density functionals based on deep neural networks, with particular interest in the kinetic-energy functional. The method is tested on (effectively) one-dimensional systems, including the hydrogen chain, non-interacting electrons, and atoms of the first two periods, with excellent results. For the atomic systems, the generalizability of the regularization method is demonstrated by training also an exchange--correlation functional, and the contrasting nature of the two functionals is discussed from a machine-learning perspective.
    Bayesian Optimization with Formal Safety Guarantees via Online Conformal Prediction. (arXiv:2306.17815v1 [cs.LG])
    Black-box zero-th order optimization is a central primitive for applications in fields as diverse as finance, physics, and engineering. In a common formulation of this problem, a designer sequentially attempts candidate solutions, receiving noisy feedback on the value of each attempt from the system. In this paper, we study scenarios in which feedback is also provided on the safety of the attempted solution, and the optimizer is constrained to limit the number of unsafe solutions that are tried throughout the optimization process. Focusing on methods based on Bayesian optimization (BO), prior art has introduced an optimization scheme -- referred to as SAFEOPT -- that is guaranteed not to select any unsafe solution with a controllable probability over feedback noise as long as strict assumptions on the safety constraint function are met. In this paper, a novel BO-based approach is introduced that satisfies safety requirements irrespective of properties of the constraint function. This strong theoretical guarantee is obtained at the cost of allowing for an arbitrary, controllable but non-zero, rate of violation of the safety constraint. The proposed method, referred to as SAFE-BOCP, builds on online conformal prediction (CP) and is specialized to the cases in which feedback on the safety constraint is either noiseless or noisy. Experimental results on synthetic and real-world data validate the advantages and flexibility of the proposed SAFE-BOCP.
    Designing Stable Neural Networks using Convex Analysis and ODEs. (arXiv:2306.17332v1 [cs.LG])
    Motivated by classical work on the numerical integration of ordinary differential equations we present a ResNet-styled neural network architecture that encodes non-expansive (1-Lipschitz) operators, as long as the spectral norms of the weights are appropriately constrained. This is to be contrasted with the ordinary ResNet architecture which, even if the spectral norms of the weights are constrained, has a Lipschitz constant that, in the worst case, grows exponentially with the depth of the network. Further analysis of the proposed architecture shows that the spectral norms of the weights can be further constrained to ensure that the network is an averaged operator, making it a natural candidate for a learned denoiser in Plug-and-Play algorithms. Using a novel adaptive way of enforcing the spectral norm constraints, we show that, even with these constraints, it is possible to train performant networks. The proposed architecture is applied to the problem of adversarially robust image classification, to image denoising, and finally to the inverse problem of deblurring.
    Navigation of micro-robot swarms for targeted delivery using reinforcement learning. (arXiv:2306.17598v1 [cs.RO])
    Micro robotics is quickly emerging to be a promising technological solution to many medical treatments with focus on targeted drug delivery. They are effective when working in swarms whose individual control is mostly infeasible owing to their minute size. Controlling a number of robots with a single controller is thus important and artificial intelligence can help us perform this task successfully. In this work, we use the Reinforcement Learning (RL) algorithms Proximal Policy Optimization (PPO) and Robust Policy Optimization (RPO) to navigate a swarm of 4, 9 and 16 microswimmers under hydrodynamic effects, controlled by their orientation, towards a circular absorbing target. We look at both PPO and RPO performances with limited state information scenarios and also test their robustness for random target location and size. We use curriculum learning to improve upon the performance and demonstrate the same in learning to navigate a swarm of 25 swimmers and steering the swarm to exemplify the manoeuvring capabilities of the RL model.
    Thompson sampling for improved exploration in GFlowNets. (arXiv:2306.17693v1 [cs.LG])
    Generative flow networks (GFlowNets) are amortized variational inference algorithms that treat sampling from a distribution over compositional objects as a sequential decision-making problem with a learnable action policy. Unlike other algorithms for hierarchical sampling that optimize a variational bound, GFlowNet algorithms can stably run off-policy, which can be advantageous for discovering modes of the target distribution. Despite this flexibility in the choice of behaviour policy, the optimal way of efficiently selecting trajectories for training has not yet been systematically explored. In this paper, we view the choice of trajectories for training as an active learning problem and approach it using Bayesian techniques inspired by methods for multi-armed bandits. The proposed algorithm, Thompson sampling GFlowNets (TS-GFN), maintains an approximate posterior distribution over policies and samples trajectories from this posterior for training. We show in two domains that TS-GFN yields improved exploration and thus faster convergence to the target distribution than the off-policy exploration strategies used in past work.
    Global Optimality in Bivariate Gradient-based DAG Learning. (arXiv:2306.17378v1 [cs.LG])
    Recently, a new class of non-convex optimization problems motivated by the statistical problem of learning an acyclic directed graphical model from data has attracted significant interest. While existing work uses standard first-order optimization schemes to solve this problem, proving the global optimality of such approaches has proven elusive. The difficulty lies in the fact that unlike other non-convex problems in the literature, this problem is not "benign", and possesses multiple spurious solutions that standard approaches can easily get trapped in. In this paper, we prove that a simple path-following optimization scheme globally converges to the global minimum of the population loss in the bivariate setting.
    The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit. (arXiv:2306.17759v1 [stat.ML])
    In deep learning theory, the covariance matrix of the representations serves as a proxy to examine the network's trainability. Motivated by the success of Transformers, we study the covariance matrix of a modified Softmax-based attention model with skip connections in the proportional limit of infinite-depth-and-width. We show that at initialization the limiting distribution can be described by a stochastic differential equation (SDE) indexed by the depth-to-width ratio. To achieve a well-defined stochastic limit, the Transformer's attention mechanism is modified by centering the Softmax output at identity, and scaling the Softmax logits by a width-dependent temperature parameter. We examine the stability of the network through the corresponding SDE, showing how the scale of both the drift and diffusion can be elegantly controlled with the aid of residual connections. The existence of a stable SDE implies that the covariance structure is well-behaved, even for very large depth and width, thus preventing the notorious issues of rank degeneracy in deep attention models. Finally, we show, through simulations, that the SDE provides a surprisingly good description of the corresponding finite-size model. We coin the name shaped Transformer for these architectural modifications.
    Learning Bilinear Models of Actuated Koopman Generators from Partially-Observed Trajectories. (arXiv:2209.09977v2 [math.DS] UPDATED)
    Data-driven models for nonlinear dynamical systems based on approximating the underlying Koopman operator or generator have proven to be successful tools for forecasting, feature learning, state estimation, and control. It has become well known that the Koopman generators for control-affine systems also have affine dependence on the input, leading to convenient finite-dimensional bilinear approximations of the dynamics. Yet there are still two main obstacles that limit the scope of current approaches for approximating the Koopman generators of systems with actuation. First, the performance of existing methods depends heavily on the choice of basis functions over which the Koopman generator is to be approximated; and there is currently no universal way to choose them for systems that are not measure preserving. Secondly, if we do not observe the full state, we may not gain access to a sufficiently rich collection of such functions to describe the dynamics. This is because the commonly used method of forming time-delayed observables fails when there is actuation. To remedy these issues, we write the dynamics of observables governed by the Koopman generator as a bilinear hidden Markov model, and determine the model parameters using the expectation-maximization (EM) algorithm. The E-step involves a standard Kalman filter and smoother, while the M-step resembles control-affine dynamic mode decomposition for the generator. We demonstrate the performance of this method on three examples, including recovery of a finite-dimensional Koopman-invariant subspace for an actuated system with a slow manifold; estimation of Koopman eigenfunctions for the unforced Duffing equation; and model-predictive control of a fluidic pinball system based only on noisy observations of lift and drag.
    Unsupervised Text Embedding Space Generation Using Generative Adversarial Networks for Text Synthesis. (arXiv:2306.17181v1 [cs.CL])
    Generative Adversarial Networks (GAN) is a model for data synthesis, which creates plausible data through the competition of generator and discriminator. Although GAN application to image synthesis is extensively studied, it has inherent limitations to natural language generation. Because natural language is composed of discrete tokens, a generator has difficulty updating its gradient through backpropagation; therefore, most text-GAN studies generate sentences starting with a random token based on a reward system. Thus, the generators of previous studies are pre-trained in an autoregressive way before adversarial training, causing data memorization that synthesized sentences reproduce the training data. In this paper, we synthesize sentences using a framework similar to the original GAN. More specifically, we propose Text Embedding Space Generative Adversarial Networks (TESGAN) which generate continuous text embedding spaces instead of discrete tokens to solve the gradient backpropagation problem. Furthermore, TESGAN conducts unsupervised learning which does not directly refer to the text of the training data to overcome the data memorization issue. By adopting this novel method, TESGAN can synthesize new sentences, showing the potential of unsupervised learning for text synthesis. We expect to see extended research combining Large Language Models with a new perspective of viewing text as an continuous space.
    ReLU Neural Networks, Polyhedral Decompositions, and Persistent Homolog. (arXiv:2306.17418v1 [math.AT])
    A ReLU neural network leads to a finite polyhedral decomposition of input space and a corresponding finite dual graph. We show that while this dual graph is a coarse quantization of input space, it is sufficiently robust that it can be combined with persistent homology to detect homological signals of manifolds in the input space from samples. This property holds for a variety of networks trained for a wide range of purposes that have nothing to do with this topological application. We found this feature to be surprising and interesting; we hope it will also be useful.
    iSCAN: Identifying Causal Mechanism Shifts among Nonlinear Additive Noise Models. (arXiv:2306.17361v1 [cs.LG])
    Structural causal models (SCMs) are widely used in various disciplines to represent causal relationships among variables in complex systems. Unfortunately, the true underlying directed acyclic graph (DAG) structure is often unknown, and determining it from observational or interventional data remains a challenging task. However, in many situations, the end goal is to identify changes (shifts) in causal mechanisms between related SCMs rather than recovering the entire underlying DAG structure. Examples include analyzing gene regulatory network structure changes between healthy and cancerous individuals or understanding variations in biological pathways under different cellular contexts. This paper focuses on identifying $\textit{functional}$ mechanism shifts in two or more related SCMs over the same set of variables -- $\textit{without estimating the entire DAG structure of each SCM}$. Prior work under this setting assumed linear models with Gaussian noises; instead, in this work we assume that each SCM belongs to the more general class of nonlinear additive noise models (ANMs). A key contribution of this work is to show that the Jacobian of the score function for the $\textit{mixture distribution}$ allows for identification of shifts in general non-parametric functional mechanisms. Once the shifted variables are identified, we leverage recent work to estimate the structural differences, if any, for the shifted variables. Experiments on synthetic and real-world data are provided to showcase the applicability of this approach.
    $\lambda$-AC: Learning latent decision-aware models for reinforcement learning in continuous state-spaces. (arXiv:2306.17366v1 [cs.LG])
    The idea of decision-aware model learning, that models should be accurate where it matters for decision-making, has gained prominence in model-based reinforcement learning. While promising theoretical results have been established, the empirical performance of algorithms leveraging a decision-aware loss has been lacking, especially in continuous control problems. In this paper, we present a study on the necessary components for decision-aware reinforcement learning models and we showcase design choices that enable well-performing algorithms. To this end, we provide a theoretical and empirical investigation into prominent algorithmic ideas in the field. We highlight that empirical design decisions established in the MuZero line of works are vital to achieving good performance for related algorithms, and we showcase differences in behavior between different instantiations of value-aware algorithms in stochastic environments. Using these insights, we propose the Latent Model-Based Decision-Aware Actor-Critic framework ($\lambda$-AC) for decision-aware model-based reinforcement learning in continuous state-spaces and highlight important design choices in different environments.
    Multigrid-Augmented Deep Learning for the Helmholtz Equation: Better Scalability with Compact Implicit Layers. (arXiv:2306.17486v1 [cs.LG])
    We present a deep learning-based iterative approach to solve the discrete heterogeneous Helmholtz equation for high wavenumbers. Combining classical iterative multigrid solvers and convolutional neural networks (CNNs) via preconditioning, we obtain a learned neural solver that is faster and scales better than a standard multigrid solver. Our approach offers three main contributions over previous neural methods of this kind. First, we construct a multilevel U-Net-like encoder-solver CNN with an implicit layer on the coarsest grid of the U-Net, where convolution kernels are inverted. This alleviates the field of view problem in CNNs and allows better scalability. Second, we improve upon the previous CNN preconditioner in terms of the number of parameters, computation time, and convergence rates. Third, we propose a multiscale training approach that enables the network to scale to problems of previously unseen dimensions while still maintaining a reasonable training procedure. Our encoder-solver architecture can be used to generalize over different slowness models of various difficulties and is efficient at solving for many right-hand sides per slowness model. We demonstrate the benefits of our novel architecture with numerical experiments on a variety of heterogeneous two-dimensional problems at high wavenumbers.
    Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting. (arXiv:2306.17563v1 [cs.IR])
    Ranking documents using Large Language Models (LLMs) by directly feeding the query and candidate documents into the prompt is an interesting and practical problem. However, there has been limited success so far, as researchers have found it difficult to outperform fine-tuned baseline rankers on benchmark datasets. We analyze pointwise and listwise ranking prompts used by existing methods and argue that off-the-shelf LLMs do not fully understand these ranking formulations, possibly due to the nature of how LLMs are trained. In this paper, we propose to significantly reduce the burden on LLMs by using a new technique called Pairwise Ranking Prompting (PRP). Our results are the first in the literature to achieve state-of-the-art ranking performance on standard benchmarks using moderate-sized open-sourced LLMs. On TREC-DL2020, PRP based on the Flan-UL2 model with 20B parameters outperforms the previous best approach in the literature, which is based on the blackbox commercial GPT-4 that has 50x (estimated) model size, by over 5% at NDCG@1. On TREC-DL2019, PRP is only inferior to the GPT-4 solution on the NDCG@5 and NDCG@10 metrics, while outperforming other existing solutions, such as InstructGPT which has 175B parameters, by over 10% for nearly all ranking metrics. Furthermore, we propose several variants of PRP to improve efficiency and show that it is possible to achieve competitive results even with linear complexity. We also discuss other benefits of PRP, such as supporting both generation and scoring LLM APIs, as well as being insensitive to input ordering.
    Act3D: Infinite Resolution Action Detection Transformer for Robotic Manipulation. (arXiv:2306.17817v1 [cs.RO])
    3D perceptual representations are well suited for robot manipulation as they easily encode occlusions and simplify spatial reasoning. Many manipulation tasks require high spatial precision in end-effector pose prediction, typically demanding high-resolution 3D perceptual grids that are computationally expensive to process. As a result, most manipulation policies operate directly in 2D, foregoing 3D inductive biases. In this paper, we propose Act3D, a manipulation policy Transformer that casts 6-DoF keypose prediction as 3D detection with adaptive spatial computation. It takes as input 3D feature clouds unprojected from one or more camera views, iteratively samples 3D point grids in free space in a coarse-to-fine manner, featurizes them using relative spatial attention to the physical feature cloud, and selects the best feature point for end-effector pose prediction. Act3D sets a new state-of-the-art in RLbench, an established manipulation benchmark. Our model achieves 10% absolute improvement over the previous SOTA 2D multi-view policy on 74 RLbench tasks and 22% absolute improvement with 3x less compute over the previous SOTA 3D policy. In thorough ablations, we show the importance of relative spatial attention, large-scale vision-language pre-trained 2D backbones, and weight tying across coarse-to-fine attentions. Code and videos are available at our project site: https://act3d.github.io/.
    Optimal Execution Using Reinforcement Learning. (arXiv:2306.17178v1 [q-fin.TR])
    This work is about optimal order execution, where a large order is split into several small orders to maximize the implementation shortfall. Based on the diversity of cryptocurrency exchanges, we attempt to extract cross-exchange signals by aligning data from multiple exchanges for the first time. Unlike most previous studies that focused on using single-exchange information, we discuss the impact of cross-exchange signals on the agent's decision-making in the optimal execution problem. Experimental results show that cross-exchange signals can provide additional information for the optimal execution of cryptocurrency to facilitate the optimal execution process.
    Subgraph Stationary Hardware-Software Inference Co-Design. (arXiv:2306.17266v1 [cs.DC])
    A growing number of applications depend on Machine Learning (ML) functionality and benefits from both higher quality ML predictions and better timeliness (latency) at the same time. A growing body of research in computer architecture, ML, and systems software literature focuses on reaching better latency-accuracy tradeoffs for ML models. Efforts include compression, quantization, pruning, early-exit models, mixed DNN precision, as well as ML inference accelerator designs that minimize latency and energy, while preserving delivered accuracy. All of them, however, yield improvements for a single static point in the latency-accuracy tradeoff space. We make a case for applications that operate in dynamically changing deployment scenarios, where no single static point is optimal. We draw on a recently proposed weight-shared SuperNet mechanism to enable serving a stream of queries that uses (activates) different SubNets within this weight-shared construct. This creates an opportunity to exploit the inherent temporal locality with our proposed SubGraph Stationary (SGS) optimization. We take a hardware-software co-design approach with a real implementation of SGS in SushiAccel and the implementation of a software scheduler SushiSched controlling which SubNets to serve and what to cache in real-time. Combined, they are vertically integrated into SUSHI-an inference serving stack. For the stream of queries, SUSHI yields up to 25% improvement in latency, 0.98% increase in served accuracy. SUSHI can achieve up to 78.7% off-chip energy savings.
    Empirical Interpretation of the Relationship Between Speech Acoustic Context and Emotion Recognition. (arXiv:2306.17500v1 [cs.SD])
    Speech emotion recognition (SER) is vital for obtaining emotional intelligence and understanding the contextual meaning of speech. Variations of consonant-vowel (CV) phonemic boundaries can enrich acoustic context with linguistic cues, which impacts SER. In practice, speech emotions are treated as single labels over an acoustic segment for a given time duration. However, phone boundaries within speech are not discrete events, therefore the perceived emotion state should also be distributed over potentially continuous time-windows. This research explores the implication of acoustic context and phone boundaries on local markers for SER using an attention-based approach. The benefits of using a distributed approach to speech emotion understanding are supported by the results of cross-corpora analysis experiments. Experiments where phones and words are mapped to the attention vectors along with the fundamental frequency to observe the overlapping distributions and thereby the relationship between acoustic context and emotion. This work aims to bridge psycholinguistic theory research with computational modelling for SER.
    Locking On: Leveraging Dynamic Vehicle-Imposed Motion Constraints to Improve Visual Localization. (arXiv:2306.17529v1 [cs.RO])
    Most 6-DoF localization and SLAM systems use static landmarks but ignore dynamic objects because they cannot be usefully incorporated into a typical pipeline. Where dynamic objects have been incorporated, typical approaches have attempted relatively sophisticated identification and localization of these objects, limiting their robustness or general utility. In this research, we propose a middle ground, demonstrated in the context of autonomous vehicles, using dynamic vehicles to provide limited pose constraint information in a 6-DoF frame-by-frame PnP-RANSAC localization pipeline. We refine initial pose estimates with a motion model and propose a method for calculating the predicted quality of future pose estimates, triggered based on whether or not the autonomous vehicle's motion is constrained by the relative frame-to-frame location of dynamic vehicles in the environment. Our approach detects and identifies suitable dynamic vehicles to define these pose constraints to modify a pose filter, resulting in improved recall across a range of localization tolerances from $0.25m$ to $5m$, compared to a state-of-the-art baseline single image PnP method and its vanilla pose filtering. Our constraint detection system is active for approximately $35\%$ of the time on the Ford AV dataset and localization is particularly improved when the constraint detection is active.
    Kernel $\epsilon$-Greedy for Contextual Bandits. (arXiv:2306.17329v1 [stat.ML])
    We consider a kernelized version of the $\epsilon$-greedy strategy for contextual bandits. More precisely, in a setting with finitely many arms, we consider that the mean reward functions lie in a reproducing kernel Hilbert space (RKHS). We propose an online weighted kernel ridge regression estimator for the reward functions. Under some conditions on the exploration probability sequence, $\{\epsilon_t\}_t$, and choice of the regularization parameter, $\{\lambda_t\}_t$, we show that the proposed estimator is consistent. We also show that for any choice of kernel and the corresponding RKHS, we achieve a sub-linear regret rate depending on the intrinsic dimensionality of the RKHS. Furthermore, we achieve the optimal regret rate of $\sqrt{T}$ under a margin condition for finite-dimensional RKHS.
    Provable Robust Watermarking for AI-Generated Text. (arXiv:2306.17439v1 [cs.CL])
    As AI-generated text increasingly resembles human-written content, the ability to detect machine-generated text becomes crucial. To address this challenge, we present GPTWatermark, a robust and high-quality solution designed to ascertain whether a piece of text originates from a specific model. Our approach extends existing watermarking strategies and employs a fixed group design to enhance robustness against editing and paraphrasing attacks. We show that our watermarked language model enjoys strong provable guarantees on generation quality, correctness in detection, and security against evasion attacks. Experimental results on various large language models (LLMs) and diverse datasets demonstrate that our method achieves superior detection accuracy and comparable generation quality in perplexity, thus promoting the responsible use of LLMs.
    Guided Deep Generative Model-based Spatial Regularization for Multiband Imaging Inverse Problems. (arXiv:2306.17197v1 [eess.IV])
    When adopting a model-based formulation, solving inverse problems encountered in multiband imaging requires to define spatial and spectral regularizations. In most of the works of the literature, spectral information is extracted from the observations directly to derive data-driven spectral priors. Conversely, the choice of the spatial regularization often boils down to the use of conventional penalizations (e.g., total variation) promoting expected features of the reconstructed image (e.g., piecewise constant). In this work, we propose a generic framework able to capitalize on an auxiliary acquisition of high spatial resolution to derive tailored data-driven spatial regularizations. This approach leverages on the ability of deep learning to extract high level features. More precisely, the regularization is conceived as a deep generative network able to encode spatial semantic features contained in this auxiliary image of high spatial resolution. To illustrate the versatility of this approach, it is instantiated to conduct two particular tasks, namely multiband image fusion and multiband image inpainting. Experimental results obtained on these two tasks demonstrate the benefit of this class of informed regularizations when compared to more conventional ones.
    Enterprise Disk Drive Scrubbing Based on Mondrian Conformal Predictors. (arXiv:2306.17169v1 [cs.DC])
    Disk scrubbing is a process aimed at resolving read errors on disks by reading data from the disk. However, scrubbing the entire storage array at once can adversely impact system performance, particularly during periods of high input/output operations. Additionally, the continuous reading of data from disks when scrubbing can result in wear and tear, especially on larger capacity disks, due to the significant time and energy consumption involved. To address these issues, we propose a selective disk scrubbing method that enhances the overall reliability and power efficiency in data centers. Our method employs a Machine Learning model based on Mondrian Conformal prediction to identify specific disks for scrubbing, by proactively predicting the health status of each disk in the storage pool, forecasting n-days in advance, and using an open-source dataset. For disks predicted as non-healthy, we mark them for replacement without further action. For healthy drives, we create a set and quantify their relative health across the entire storage pool based on the predictor's confidence. This enables us to prioritize selective scrubbing for drives with established scrubbing frequency based on the scrub cycle. The method we propose provides an efficient and dependable solution for managing enterprise disk drives. By scrubbing just 22.7% of the total storage disks, we can achieve optimized energy consumption and reduce the carbon footprint of the data center.  ( 3 min )
    DisasterResponseGPT: Large Language Models for Accelerated Plan of Action Development in Disaster Response Scenarios. (arXiv:2306.17271v1 [cs.LG])
    The development of plans of action in disaster response scenarios is a time-consuming process. Large Language Models (LLMs) offer a powerful solution to expedite this process through in-context learning. This study presents DisasterResponseGPT, an algorithm that leverages LLMs to generate valid plans of action quickly by incorporating disaster response and planning guidelines in the initial prompt. In DisasterResponseGPT, users input the scenario description and receive a plan of action as output. The proposed method generates multiple plans within seconds, which can be further refined following the user's feedback. Preliminary results indicate that the plans of action developed by DisasterResponseGPT are comparable to human-generated ones while offering greater ease of modification in real-time. This approach has the potential to revolutionize disaster response operations by enabling rapid updates and adjustments during the plan's execution.
    FedBone: Towards Large-Scale Federated Multi-Task Learning. (arXiv:2306.17465v1 [cs.LG])
    Heterogeneous federated multi-task learning (HFMTL) is a federated learning technique that combines heterogeneous tasks of different clients to achieve more accurate, comprehensive predictions. In real-world applications, visual and natural language tasks typically require large-scale models to extract high-level abstract features. However, large-scale models cannot be directly applied to existing federated multi-task learning methods. Existing HFML methods also disregard the impact of gradient conflicts on multi-task optimization during the federated aggregation process. In this work, we propose an innovative framework called FedBone, which enables the construction of large-scale models with better generalization from the perspective of server-client split learning and gradient projection. We split the entire model into two components: a large-scale general model (referred to as the general model) on the cloud server and multiple task-specific models (referred to as the client model) on edge clients, solving the problem of insufficient computing power on edge clients. The conflicting gradient projection technique is used to enhance the generalization of the large-scale general model between different tasks. The proposed framework is evaluated on two benchmark datasets and a real ophthalmic dataset. Comprehensive results demonstrate that FedBone efficiently adapts to heterogeneous local tasks of each client and outperforms existing federated learning algorithms in most dense prediction and classification tasks with off-the-shelf computational resources on the client side.
    Machine learning for sports betting: should predictive models be optimised for accuracy or calibration?. (arXiv:2303.06021v3 [cs.LG] UPDATED)
    Sports betting's recent federal legalisation in the USA coincides with the golden age of machine learning. If bettors can leverage data to reliably predict the probability of an outcome, they can recognise when the bookmaker's odds are in their favour. As sports betting is a multi-billion dollar industry in the USA alone, identifying such opportunities could be extremely lucrative. Many researchers have applied machine learning to the sports outcome prediction problem, generally using accuracy to evaluate the performance of predictive models. We hypothesise that for the sports betting problem, model calibration is more important than accuracy. To test this hypothesis, we train models on NBA data over several seasons and run betting experiments on a single season, using published odds. We show that optimising the predictive model for calibration leads to greater returns than optimising for accuracy, on average (return on investment of $+34.69\%$ versus $-35.17\%$) and in the best case ($+36.93\%$ versus $+5.56\%$). These findings suggest that for sports betting (or any probabilistic decision-making problem), calibration is a more important metric than accuracy. Sports bettors who wish to increase profits should therefore optimise their predictive model for calibration.
    Limits of Model Selection under Transfer Learning. (arXiv:2305.00152v3 [stat.ML] UPDATED)
    Theoretical studies on transfer learning or domain adaptation have so far focused on situations with a known hypothesis class or model; however in practice, some amount of model selection is usually involved, often appearing under the umbrella term of hyperparameter-tuning: for example, one may think of the problem of tuning for the right neural network architecture towards a target task, while leveraging data from a related source task. Now, in addition to the usual tradeoffs on approximation vs estimation errors involved in model selection, this problem brings in a new complexity term, namely, the transfer distance between source and target distributions, which is known to vary with the choice of hypothesis class. We present a first study of this problem, focusing on classification; in particular, the analysis reveals some remarkable phenomena: adaptive rates, i.e., those achievable with no distributional information, can be arbitrarily slower than oracle rates, i.e., when given knowledge on distances.
    Integrating Tick-level Data and Periodical Signal for High-frequency Market Making. (arXiv:2306.17179v1 [q-fin.TR])
    We focus on the problem of market making in high-frequency trading. Market making is a critical function in financial markets that involves providing liquidity by buying and selling assets. However, the increasing complexity of financial markets and the high volume of data generated by tick-level trading makes it challenging to develop effective market making strategies. To address this challenge, we propose a deep reinforcement learning approach that fuses tick-level data with periodic prediction signals to develop a more accurate and robust market making strategy. Our results of market making strategies based on different deep reinforcement learning algorithms under the simulation scenarios and real data experiments in the cryptocurrency markets show that the proposed framework outperforms existing methods in terms of profitability and risk management.  ( 2 min )
    Steganographic Capacity of Deep Learning Models. (arXiv:2306.17189v1 [cs.CR])
    As machine learning and deep learning models become ubiquitous, it is inevitable that there will be attempts to exploit such models in various attack scenarios. For example, in a steganographic-based attack, information could be hidden in a learning model, which might then be used to distribute malware, or for other malicious purposes. In this research, we consider the steganographic capacity of several learning models. Specifically, we train a Multilayer Perceptron (MLP), Convolutional Neural Network (CNN), and Transformer model on a challenging malware classification problem. For each of the resulting models, we determine the number of low-order bits of the trained parameters that can be altered without significantly affecting the performance of the model. We find that the steganographic capacity of the learning models tested is surprisingly high, and that in each case, there is a clear threshold after which model performance rapidly degrades.  ( 2 min )
    Federated Ensemble YOLOv5 - A Better Generalized Object Detection Algorithm. (arXiv:2306.17829v1 [cs.CV])
    Federated learning (FL) has gained significant traction as a privacy-preserving algorithm, but the underlying resembles of federated learning algorithm like Federated averaging (FED Avg) or Federated SGD (FED SGD) to ensemble learning algorithms has not been fully explored. The purpose of this paper is to examine the application of FL to object detection as a method to enhance generalizability, and to compare its performance against a centralized training approach for an object detection algorithm. Specifically, we investigate the performance of a YOLOv5 model trained using FL across multiple clients and employ a random sampling strategy without replacement, so each client holds a portion of the same dataset used for centralized training. Our experimental results showcase the superior efficiency of the FL object detector's global model in generating accurate bounding boxes for unseen objects, with the test set being a mixture of objects from two distinct clients not represented in the training dataset. These findings suggest that FL can be viewed from an ensemble algorithm perspective, akin to a synergistic blend of Bagging and Boosting techniques. As a result, FL can be seen not only as a method to enhance privacy, but also as a method to enhance the performance of a machine learning model.
    Learning Environment Models with Continuous Stochastic Dynamics. (arXiv:2306.17204v1 [cs.LG])
    Solving control tasks in complex environments automatically through learning offers great potential. While contemporary techniques from deep reinforcement learning (DRL) provide effective solutions, their decision-making is not transparent. We aim to provide insights into the decisions faced by the agent by learning an automaton model of environmental behavior under the control of an agent. However, for most control problems, automata learning is not scalable enough to learn a useful model. In this work, we raise the capabilities of automata learning such that it is possible to learn models for environments that have complex and continuous dynamics. The core of the scalability of our method lies in the computation of an abstract state-space representation, by applying dimensionality reduction and clustering on the observed environmental state space. The stochastic transitions are learned via passive automata learning from observed interactions of the agent and the environment. In an iterative model-based RL process, we sample additional trajectories to learn an accurate environment model in the form of a discrete-state Markov decision process (MDP). We apply our automata learning framework on popular RL benchmarking environments in the OpenAI Gym, including LunarLander, CartPole, Mountain Car, and Acrobot. Our results show that the learned models are so precise that they enable the computation of policies solving the respective control tasks. Yet the models are more concise and more general than neural-network-based policies and by using MDPs we benefit from a wealth of tools available for analyzing them. When solving the task of LunarLander, the learned model even achieved similar or higher rewards than deep RL policies learned with stable-baselines3.  ( 3 min )
    Fast and Robust State Estimation and Tracking via Hierarchical Learning. (arXiv:2306.17267v1 [cs.LG])
    Fully distributed estimation and tracking solutions to large-scale multi-agent networks suffer slow convergence and are vulnerable to network failures. In this paper, we aim to speed up the convergence and enhance the resilience of state estimation and tracking using a simple hierarchical system architecture wherein agents are clusters into smaller networks, and a parameter server exists to aid the information exchanges among networks. The information exchange among networks is expensive and occurs only once in a while. We propose two consensus + innovation algorithms for the state estimation and tracking problems, respectively. In both algorithms, we use a novel hierarchical push-sum consensus component. For the state estimation, we use dual averaging as the local innovation component. State tracking is much harder to tackle in the presence of dropping-link failures and the standard integration of the consensus and innovation approaches are no longer applicable. Moreover, dual averaging is no longer feasible. Our algorithm introduces a pair of additional variables per link and ensure the relevant local variables evolve according to the state dynamics, and use projected local gradient descent as the local innovation component. We also characterize the convergence rates of both of the algorithms under linear local observation model and minimal technical assumptions. We numerically validate our algorithm through simulation of both state estimation and tracking problems.  ( 2 min )
    TTSWING: a Dataset for Table Tennis Swing Analysis. (arXiv:2306.17550v1 [cs.LG])
    We introduce TTSWING, a novel dataset designed for table tennis swing analysis. This dataset comprises comprehensive swing information obtained through 9-axis sensors integrated into custom-made racket grips, accompanied by anonymized demographic data of the players. We detail the data collection and annotation procedures. Furthermore, we conduct pilot studies utilizing diverse machine learning models for swing analysis. TTSWING holds tremendous potential to facilitate innovative research in table tennis analysis and is a valuable resource for the scientific community. We release the dataset and experimental codes at https://github.com/DEPhantom/TTSWING.
    Towards Zero-Shot Scale-Aware Monocular Depth Estimation. (arXiv:2306.17253v1 [cs.CV])
    Monocular depth estimation is scale-ambiguous, and thus requires scale supervision to produce metric predictions. Even so, the resulting models will be geometry-specific, with learned scales that cannot be directly transferred across domains. Because of that, recent works focus instead on relative depth, eschewing scale in favor of improved up-to-scale zero-shot transfer. In this work we introduce ZeroDepth, a novel monocular depth estimation framework capable of predicting metric scale for arbitrary test images from different domains and camera parameters. This is achieved by (i) the use of input-level geometric embeddings that enable the network to learn a scale prior over objects; and (ii) decoupling the encoder and decoder stages, via a variational latent representation that is conditioned on single frame information. We evaluated ZeroDepth targeting both outdoor (KITTI, DDAD, nuScenes) and indoor (NYUv2) benchmarks, and achieved a new state-of-the-art in both settings using the same pre-trained model, outperforming methods that train on in-domain data and require test-time scaling to produce metric estimates.  ( 2 min )
    Residual Feature Pyramid Network for Enhancement of Vascular Patterns. (arXiv:2306.17200v1 [eess.IV])
    The accuracy of finger vein recognition systems gets degraded due to low and uneven contrast between veins and surroundings, often resulting in poor detection of vein patterns. We propose a finger-vein enhancement technique, ResFPN (Residual Feature Pyramid Network), as a generic preprocessing method agnostic to the recognition pipeline. A bottom-up pyramidal architecture using the novel Structure Detection block (SDBlock) facilitates extraction of veins of varied widths. Using a feature aggregation module (FAM), we combine these vein-structures, and train the proposed ResFPN for detection of veins across scales. With enhanced presentations, our experiments indicate a reduction upto 5% in the average recognition errors for commonly used recognition pipeline over two publicly available datasets. These improvements are persistent even in cross-dataset scenario where the dataset used to train the ResFPN is different from the one used for recognition.  ( 2 min )
    Scattering Spectra Models for Physics. (arXiv:2306.17210v1 [physics.data-an])
    Physicists routinely need probabilistic models for a number of tasks such as parameter inference or the generation of new realizations of a field. Establishing such models for highly non-Gaussian fields is a challenge, especially when the number of samples is limited. In this paper, we introduce scattering spectra models for stationary fields and we show that they provide accurate and robust statistical descriptions of a wide range of fields encountered in physics. These models are based on covariances of scattering coefficients, i.e. wavelet decomposition of a field coupled with a point-wise modulus. After introducing useful dimension reductions taking advantage of the regularity of a field under rotation and scaling, we validate these models on various multi-scale physical fields and demonstrate that they reproduce standard statistics, including spatial moments up to 4th order. These scattering spectra provide us with a low-dimensional structured representation that captures key properties encountered in a wide range of physical fields. These generic models can be used for data exploration, classification, parameter inference, symmetry detection, and component separation.  ( 2 min )
    An end-to-end framework for gene expression classification by integrating a background knowledge graph: application to cancer prognosis prediction. (arXiv:2306.17202v1 [q-bio.QM])
    Biological data may be separated into primary data, such as gene expression, and secondary data, such as pathways and protein-protein interactions. Methods using secondary data to enhance the analysis of primary data are promising, because secondary data have background information that is not included in primary data. In this study, we proposed an end-to-end framework to integrally handle secondary data to construct a classification model for primary data. We applied this framework to cancer prognosis prediction using gene expression data and a biological network. Cross-validation results indicated that our model achieved higher accuracy compared with a deep neural network model without background biological network information. Experiments conducted in patient groups by cancer type showed improvement in ROC-area under the curve for many groups. Visualizations of high accuracy cancer types identified contributing genes and pathways by enrichment analysis. Known biomarkers and novel biomarker candidates were identified through these experiments.  ( 2 min )
    Landmark Guided Active Exploration with Stable Low-level Policy Learning. (arXiv:2306.17484v1 [cs.LG])
    Goal-conditioned hierarchical reinforcement learning (GCHRL) decomposes long-horizon tasks into sub-tasks through a hierarchical framework and it has demonstrated promising results across a variety of domains. However, the high-level policy's action space is often excessively large, presenting a significant challenge to effective exploration and resulting in potentially inefficient training. Moreover, the dynamic variability of the low-level policy introduces non-stationarity to the high-level state transition function, significantly impeding the learning of the high-level policy. In this paper, we design a measure of prospect for subgoals by planning in the goal space based on the goal-conditioned value function. Building upon the measure of prospect, we propose a landmark-guided exploration strategy by integrating the measures of prospect and novelty which aims to guide the agent to explore efficiently and improve sample efficiency. To address the non-stationarity arising from the dynamic changes of the low-level policy, we apply a state-specific regularization to the learning of low-level policy, which facilitates stable learning of the hierarchical policy. The experimental results demonstrate that our proposed exploration strategy significantly outperforms the baseline methods across multiple tasks.
    Scalable tensor methods for nonuniform hypergraphs. (arXiv:2306.17825v1 [math.NA])
    While multilinear algebra appears natural for studying the multiway interactions modeled by hypergraphs, tensor methods for general hypergraphs have been stymied by theoretical and practical barriers. A recently proposed adjacency tensor is applicable to nonuniform hypergraphs, but is prohibitively costly to form and analyze in practice. We develop tensor times same vector (TTSV) algorithms for this tensor which improve complexity from $O(n^r)$ to a low-degree polynomial in $r$, where $n$ is the number of vertices and $r$ is the maximum hyperedge size. Our algorithms are implicit, avoiding formation of the order $r$ adjacency tensor. We demonstrate the flexibility and utility of our approach in practice by developing tensor-based hypergraph centrality and clustering algorithms. We also show these tensor measures offer complementary information to analogous graph-reduction approaches on data, and are also able to detect higher-order structure that many existing matrix-based approaches provably cannot.
    Why can neural language models solve next-word prediction? A mathematical perspective. (arXiv:2306.17184v1 [cs.CL])
    Recently, deep learning has revolutionized the field of natural language processing, with neural language models proving to be very effective for next-word prediction. However, a rigorous theoretical explanation for their success in the context of formal language theory has not yet been developed, as it is unclear why neural language models can learn the combinatorial rules that govern the next-word prediction task. In this paper, we study a class of formal languages that can be used to model real-world examples of English sentences. We construct neural language models can solve the next-word prediction task in this context with zero error. Our proof highlights the different roles of the embedding layer and the fully connected component within the neural language model.  ( 2 min )
    Resetting the Optimizer in Deep RL: An Empirical Study. (arXiv:2306.17833v1 [cs.LG])
    We focus on the task of approximating the optimal value function in deep reinforcement learning. This iterative process is comprised of approximately solving a sequence of optimization problems where the objective function can change per iteration. The common approach to solving the problem is to employ modern variants of the stochastic gradient descent algorithm such as Adam. These optimizers maintain their own internal parameters such as estimates of the first and the second moment of the gradient, and update these parameters over time. Therefore, information obtained in previous iterations is being used to solve the optimization problem in the current iteration. We hypothesize that this can contaminate the internal parameters of the employed optimizer in situations where the optimization landscape of the previous iterations is quite different from the current iteration. To hedge against this effect, a simple idea is to reset the internal parameters of the optimizer when starting a new iteration. We empirically investigate this resetting strategy by employing various optimizers in conjunction with the Rainbow algorithm. We demonstrate that this simple modification unleashes the true potential of modern optimizers, and significantly improves the performance of deep RL on the Atari benchmark.
    Federated Object Detection for Quality Inspection in Shared Production. (arXiv:2306.17645v1 [cs.LG])
    Federated learning (FL) has emerged as a promising approach for training machine learning models on decentralized data without compromising data privacy. In this paper, we propose a FL algorithm for object detection in quality inspection tasks using YOLOv5 as the object detection algorithm and Federated Averaging (FedAvg) as the FL algorithm. We apply this approach to a manufacturing use-case where multiple factories/clients contribute data for training a global object detection model while preserving data privacy on a non-IID dataset. Our experiments demonstrate that our FL approach achieves better generalization performance on the overall clients' test dataset and generates improved bounding boxes around the objects compared to models trained using local clients' datasets. This work showcases the potential of FL for quality inspection tasks in the manufacturing industry and provides valuable insights into the performance and feasibility of utilizing YOLOv5 and FedAvg for federated object detection.
    Practical and Asymptotically Exact Conditional Sampling in Diffusion Models. (arXiv:2306.17775v1 [stat.ML])
    Diffusion models have been successful on a range of conditional generation tasks including molecular design and text-to-image generation. However, these achievements have primarily depended on task-specific conditional training or error-prone heuristic approximations. Ideally, a conditional generation method should provide exact samples for a broad range of conditional distributions without requiring task-specific training. To this end, we introduce the Twisted Diffusion Sampler, or TDS. TDS is a sequential Monte Carlo (SMC) algorithm that targets the conditional distributions of diffusion models. The main idea is to use twisting, an SMC technique that enjoys good computational efficiency, to incorporate heuristic approximations without compromising asymptotic exactness. We first find in simulation and on MNIST image inpainting and class-conditional generation tasks that TDS provides a computational statistical trade-off, yielding more accurate approximations with many particles but with empirical improvements over heuristics with as few as two particles. We then turn to motif-scaffolding, a core task in protein design, using a TDS extension to Riemannian diffusion models. On benchmark test cases, TDS allows flexible conditioning criteria and often outperforms the state of the art.
    Improving Federated Aggregation with Deep Unfolding Networks. (arXiv:2306.17362v1 [cs.LG])
    The performance of Federated learning (FL) is negatively affected by device differences and statistical characteristics between participating clients. To address this issue, we introduce a deep unfolding network (DUN)-based technique that learns adaptive weights that unbiasedly ameliorate the adverse impacts of heterogeneity. The proposed method demonstrates impressive accuracy and quality-aware aggregation. Furthermore, it evaluated the best-weighted normalization approach to define less computational power on the aggregation method. The numerical experiments in this study demonstrate the effectiveness of this approach and provide insights into the interpretability of the unbiased weights learned. By incorporating unbiased weights into the model, the proposed approach effectively addresses quality-aware aggregation under the heterogeneity of the participating clients and the FL environment. Codes and details are \href{https://github.com/shanikairoshi/Improved_DUN_basedFL_Aggregation}{here}.
    Algorithms for bounding contribution for histogram estimation under user-level privacy. (arXiv:2206.03008v2 [cs.LG] UPDATED)
    We study the problem of histogram estimation under user-level differential privacy, where the goal is to preserve the privacy of all entries of any single user. We consider the heterogeneous scenario where the quantity of data can be different for each user. In this scenario, the amount of noise injected into the histogram to obtain differential privacy is proportional to the maximum user contribution, which can be amplified by few outliers. One approach to circumvent this would be to bound (or limit) the contribution of each user to the histogram. However, if users are limited to small contributions, a significant amount of data will be discarded. In this work, we propose algorithms to choose the best user contribution bound for histogram estimation under both bounded and unbounded domain settings. When the size of the domain is bounded, we propose a user contribution bounding strategy that almost achieves a two-approximation with respect to the best contribution bound in hindsight. For unbounded domain histogram estimation, we propose an algorithm that is logarithmic-approximation with respect to the best contribution bound in hindsight. This result holds without any distribution assumptions on the data. Experiments on both real and synthetic datasets verify our theoretical findings and demonstrate the effectiveness of our algorithms. We also show that clipping bias introduced by bounding user contribution may be reduced under mild distribution assumptions, which can be of independent interest.
    TemperatureGAN: Generative Modeling of Regional Atmospheric Temperatures. (arXiv:2306.17248v1 [cs.LG])
    Stochastic generators are useful for estimating climate impacts on various sectors. Projecting climate risk in various sectors, e.g. energy systems, requires generators that are accurate (statistical resemblance to ground-truth), reliable (do not produce erroneous examples), and efficient. Leveraging data from the North American Land Data Assimilation System, we introduce TemperatureGAN, a Generative Adversarial Network conditioned on months, locations, and time periods, to generate 2m above ground atmospheric temperatures at an hourly resolution. We propose evaluation methods and metrics to measure the quality of generated samples. We show that TemperatureGAN produces high-fidelity examples with good spatial representation and temporal dynamics consistent with known diurnal cycles.
    Prediction of COVID-19 Patients' Emergency Room Revisit using Multi-Source Transfer Learning. (arXiv:2306.17257v1 [cs.LG])
    The coronavirus disease 2019 (COVID-19) has led to a global pandemic of significant severity. In addition to its high level of contagiousness, COVID-19 can have a heterogeneous clinical course, ranging from asymptomatic carriers to severe and potentially life-threatening health complications. Many patients have to revisit the emergency room (ER) within a short time after discharge, which significantly increases the workload for medical staff. Early identification of such patients is crucial for helping physicians focus on treating life-threatening cases. In this study, we obtained Electronic Health Records (EHRs) of 3,210 encounters from 13 affiliated ERs within the University of Pittsburgh Medical Center between March 2020 and January 2021. We leveraged a Natural Language Processing technique, ScispaCy, to extract clinical concepts and used the 1001 most frequent concepts to develop 7-day revisit models for COVID-19 patients in ERs. The research data we collected from 13 ERs may have distributional differences that could affect the model development. To address this issue, we employed a classic deep transfer learning method called the Domain Adversarial Neural Network (DANN) and evaluated different modeling strategies, including the Multi-DANN algorithm, the Single-DANN algorithm, and three baseline methods. Results showed that the Multi-DANN models outperformed the Single-DANN models and baseline models in predicting revisits of COVID-19 patients to the ER within 7 days after discharge. Notably, the Multi-DANN strategy effectively addressed the heterogeneity among multiple source domains and improved the adaptation of source data to the target domain. Moreover, the high performance of Multi-DANN models indicates that EHRs are informative for developing a prediction model to identify COVID-19 patients who are very likely to revisit an ER within 7 days after discharge.  ( 3 min )
    Design of Induction Machines using Reinforcement Learning. (arXiv:2306.17626v1 [cs.LG])
    The design of induction machine is a challenging task due to different electromagnetic and thermal constraints. Quick estimation of machine's dimensions is important in the sales tool to provide quick quotations to customers based on specific requirements. The key part of this process is to select different design parameters like length, diameter, tooth tip height and winding turns to achieve certain torque, current and temperature of the machine. Electrical machine designers, with their experience know how to alter different machine design parameters to achieve a customer specific operation requirements. We propose a reinforcement learning algorithm to design a customised induction motor. The neural network model is trained off-line by simulating different instances of of electrical machine design game with a reward or penalty function when a good or bad design choice is made. The results demonstrate that the suggested method automates electrical machine design without applying any human engineering knowledge.
    The power of motifs as inductive bias for learning molecular distributions. (arXiv:2306.17246v1 [cs.LG])
    Machine learning for molecules holds great potential for efficiently exploring the vast chemical space and thus streamlining the drug discovery process by facilitating the design of new therapeutic molecules. Deep generative models have shown promising results for molecule generation, but the benefits of specific inductive biases for learning distributions over small graphs are unclear. Our study aims to investigate the impact of subgraph structures and vocabulary design on distribution learning, using small drug molecules as a case study. To this end, we introduce Subcover, a new subgraph-based fragmentation scheme, and evaluate it through a two-step variational auto-encoder. Our results show that Subcover's improved identification of chemically meaningful subgraphs leads to a relative improvement of the FCD score by 30%, outperforming previous methods. Our findings highlight the potential of Subcover to enhance the performance and scalability of existing methods, contributing to the advancement of drug discovery.
    Limits of Machine Learning for Automatic Vulnerability Detection. (arXiv:2306.17193v1 [cs.CR])
    Recent results of machine learning for automatic vulnerability detection have been very promising indeed: Given only the source code of a function $f$, models trained by machine learning techniques can decide if $f$ contains a security flaw with up to 70% accuracy. But how do we know that these results are general and not specific to the datasets? To study this question, researchers proposed to amplify the testing set by injecting semantic preserving changes and found that the model's accuracy significantly drops. In other words, the model uses some unrelated features during classification. In order to increase the robustness of the model, researchers proposed to train on amplified training data, and indeed model accuracy increased to previous levels. In this paper, we replicate and continue this investigation, and provide an actionable model benchmarking methodology to help researchers better evaluate advances in machine learning for vulnerability detection. Specifically, we propose (i) a cross validation algorithm, where a semantic preserving transformation is applied during the amplification of either the training set or the testing set, and (ii) the amplification of the testing set with code snippets where the vulnerabilities are fixed. Using 11 transformations, 3 ML techniques, and 2 datasets, we find that the improved robustness only applies to the specific transformations used during training data amplification. In other words, the robustified models still rely on unrelated features for predicting the vulnerabilities in the testing data. Additionally, we find that the trained models are unable to generalize to the modified setting which requires to distinguish vulnerable functions from their patches.  ( 3 min )
    Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models. (arXiv:2306.17203v1 [cs.SD])
    The Video-to-Audio (V2A) model has recently gained attention for its practical application in generating audio directly from silent videos, particularly in video/film production. However, previous methods in V2A have limited generation quality in terms of temporal synchronization and audio-visual relevance. We present Diff-Foley, a synchronized Video-to-Audio synthesis method with a latent diffusion model (LDM) that generates high-quality audio with improved synchronization and audio-visual relevance. We adopt contrastive audio-visual pretraining (CAVP) to learn more temporally and semantically aligned features, then train an LDM with CAVP-aligned visual features on spectrogram latent space. The CAVP-aligned features enable LDM to capture the subtler audio-visual correlation via a cross-attention module. We further significantly improve sample quality with `double guidance'. Diff-Foley achieves state-of-the-art V2A performance on current large scale V2A dataset. Furthermore, we demonstrate Diff-Foley practical applicability and generalization capabilities via downstream finetuning. Project Page: see https://diff-foley.github.io/  ( 2 min )
    On the Exploitability of Instruction Tuning. (arXiv:2306.17194v1 [cs.CR])
    Instruction tuning is an effective technique to align large language models (LLMs) with human intents. In this work, we investigate how an adversary can exploit instruction tuning by injecting specific instruction-following examples into the training data that intentionally changes the model's behavior. For example, an adversary can achieve content injection by injecting training examples that mention target content and eliciting such behavior from downstream models. To achieve this goal, we propose \textit{AutoPoison}, an automated data poisoning pipeline. It naturally and coherently incorporates versatile attack goals into poisoned data with the help of an oracle LLM. We showcase two example attacks: content injection and over-refusal attacks, each aiming to induce a specific exploitable behavior. We quantify and benchmark the strength and the stealthiness of our data poisoning scheme. Our results show that AutoPoison allows an adversary to change a model's behavior by poisoning only a small fraction of data while maintaining a high level of stealthiness in the poisoned examples. We hope our work sheds light on how data quality affects the behavior of instruction-tuned models and raises awareness of the importance of data quality for responsible deployments of LLMs. Code is available at \url{https://github.com/azshue/AutoPoison}.
    Classification and Explanation of Distributed Denial-of-Service (DDoS) Attack Detection using Machine Learning and Shapley Additive Explanation (SHAP) Methods. (arXiv:2306.17190v1 [cs.CR])
    DDoS attacks involve overwhelming a target system with a large number of requests or traffic from multiple sources, disrupting the normal traffic of a targeted server, service, or network. Distinguishing between legitimate traffic and malicious traffic is a challenging task. It is possible to classify legitimate traffic and malicious traffic and analysis the network traffic by using machine learning and deep learning techniques. However, an inter-model explanation implemented to classify a traffic flow whether is benign or malicious is an important investigation of the inner working theory of the model to increase the trustworthiness of the model. Explainable Artificial Intelligence (XAI) can explain the decision-making of the machine learning models that can be classified and identify DDoS traffic. In this context, we proposed a framework that can not only classify legitimate traffic and malicious traffic of DDoS attacks but also use SHAP to explain the decision-making of the classifier model. To address this concern, we first adopt feature selection techniques to select the top 20 important features based on feature importance techniques (e.g., XGB-based SHAP feature importance). Following that, the Multi-layer Perceptron Network (MLP) part of our proposed model uses the optimized features of the DDoS attack dataset as inputs to classify legitimate and malicious traffic. We perform extensive experiments with all features and selected features. The evaluation results show that the model performance with selected features achieves above 99\% accuracy. Finally, to provide interpretability, XAI can be adopted to explain the model performance between the prediction results and features based on global and local explanations by SHAP, which can better explain the results achieved by our proposed framework.  ( 3 min )
    Probabilistic Constraint for Safety-Critical Reinforcement Learning. (arXiv:2306.17279v1 [cs.LG])
    In this paper, we consider the problem of learning safe policies for probabilistic-constrained reinforcement learning (RL). Specifically, a safe policy or controller is one that, with high probability, maintains the trajectory of the agent in a given safe set. We establish a connection between this probabilistic-constrained setting and the cumulative-constrained formulation that is frequently explored in the existing literature. We provide theoretical bounds elucidating that the probabilistic-constrained setting offers a better trade-off in terms of optimality and safety (constraint satisfaction). The challenge encountered when dealing with the probabilistic constraints, as explored in this work, arises from the absence of explicit expressions for their gradients. Our prior work provides such an explicit gradient expression for probabilistic constraints which we term Safe Policy Gradient-REINFORCE (SPG-REINFORCE). In this work, we provide an improved gradient SPG-Actor-Critic that leads to a lower variance than SPG-REINFORCE, which is substantiated by our theoretical results. A noteworthy aspect of both SPGs is their inherent algorithm independence, rendering them versatile for application across a range of policy-based algorithms. Furthermore, we propose a Safe Primal-Dual algorithm that can leverage both SPGs to learn safe policies. It is subsequently followed by theoretical analyses that encompass the convergence of the algorithm, as well as the near-optimality and feasibility on average. In addition, we test the proposed approaches by a series of empirical experiments. These experiments aim to examine and analyze the inherent trade-offs between the optimality and safety, and serve to substantiate the efficacy of two SPGs, as well as our theoretical contributions.  ( 3 min )
    Suffering Toasters. (arXiv:2306.17258v1 [cs.AI])
    A widely accepted definition of intelligence in the context of Artificial Intelligence (AI) still eludes us. Due to our exceedingly rapid development of AI paradigms, architectures, and tools, the prospect of naturally arising AI consciousness seems more likely than ever. In this paper, we claim that all current intelligence tests are insufficient to point to the existence or lack of intelligence \textbf{as humans intuitively perceive it}. We draw from ideas in the philosophy of science, psychology, and other areas of research to provide a clearer definition of the problems of artificial intelligence, self-awareness, and agency. We furthermore propose a new heuristic approach to test for artificial self-awareness and outline a possible implementation. Finally, we discuss some of the questions that arise from this new heuristic, be they philosophical or implementation-oriented.  ( 2 min )
    Learning Delays in Spiking Neural Networks using Dilated Convolutions with Learnable Spacings. (arXiv:2306.17670v1 [cs.NE])
    Spiking Neural Networks (SNNs) are a promising research direction for building power-efficient information processing systems, especially for temporal tasks such as speech recognition. In SNNs, delays refer to the time needed for one spike to travel from one neuron to another. These delays matter because they influence the spike arrival times, and it is well-known that spiking neurons respond more strongly to coincident input spikes. More formally, it has been shown theoretically that plastic delays greatly increase the expressivity in SNNs. Yet, efficient algorithms to learn these delays have been lacking. Here, we propose a new discrete-time algorithm that addresses this issue in deep feedforward SNNs using backpropagation, in an offline manner. To simulate delays between consecutive layers, we use 1D convolutions across time. The kernels contain only a few non-zero weights - one per synapse - whose positions correspond to the delays. These positions are learned together with the weights using the recently proposed Dilated Convolution with Learnable Spacings (DCLS). We evaluated our method on the Spiking Heidelberg Dataset (SHD) and the Spiking Speech Commands (SSC) benchmarks, which require detecting temporal patterns. We used feedforward SNNs with two hidden fully connected layers. We showed that fixed random delays help, and that learning them helps even more. Furthermore, our method outperformed the state-of-the-art in both SHD and SSC without using recurrent connections and with substantially fewer parameters. Our work demonstrates the potential of delay learning in developing accurate and precise models for temporal data processing. Our code is based on PyTorch / SpikingJelly and available at: https://github.com/Thvnvtos/SNN-delays
  • Open

    Improving Expert Predictions with Conformal Prediction. (arXiv:2201.12006v5 [cs.LG] UPDATED)
    Automated decision support systems promise to help human experts solve multiclass classification tasks more efficiently and accurately. However, existing systems typically require experts to understand when to cede agency to the system or when to exercise their own agency. Otherwise, the experts may be better off solving the classification tasks on their own. In this work, we develop an automated decision support system that, by design, does not require experts to understand when to trust the system to improve performance. Rather than providing (single) label predictions and letting experts decide when to trust these predictions, our system provides sets of label predictions constructed using conformal prediction$\unicode{x2014}$prediction sets$\unicode{x2014}$and forcefully asks experts to predict labels from these sets. By using conformal prediction, our system can precisely trade-off the probability that the true label is not in the prediction set, which determines how frequently our system will mislead the experts, and the size of the prediction set, which determines the difficulty of the classification task the experts need to solve using our system. In addition, we develop an efficient and near-optimal search method to find the conformal predictor under which the experts benefit the most from using our system. Simulation experiments using synthetic and real expert predictions demonstrate that our system may help experts make more accurate predictions and is robust to the accuracy of the classifier the conformal predictor relies on.
    The Bayesian Learning Rule. (arXiv:2107.04562v3 [stat.ML] UPDATED)
    We show that many machine-learning algorithms are specific instances of a single algorithm called the Bayesian learning rule. The rule, derived from Bayesian principles, yields a wide-range of algorithms from fields such as optimization, deep learning, and graphical models. This includes classical algorithms such as ridge regression, Newton's method, and Kalman filter, as well as modern deep-learning algorithms such as stochastic-gradient descent, RMSprop, and Dropout. The key idea in deriving such algorithms is to approximate the posterior using candidate distributions estimated by using natural gradients. Different candidate distributions result in different algorithms and further approximations to natural gradients give rise to variants of those algorithms. Our work not only unifies, generalizes, and improves existing algorithms, but also helps us design new ones.  ( 2 min )
    Selecting Robust Features for Machine Learning Applications using Multidata Causal Discovery. (arXiv:2304.05294v5 [stat.ML] UPDATED)
    Robust feature selection is vital for creating reliable and interpretable Machine Learning (ML) models. When designing statistical prediction models in cases where domain knowledge is limited and underlying interactions are unknown, choosing the optimal set of features is often difficult. To mitigate this issue, we introduce a Multidata (M) causal feature selection approach that simultaneously processes an ensemble of time series datasets and produces a single set of causal drivers. This approach uses the causal discovery algorithms PC1 or PCMCI that are implemented in the Tigramite Python package. These algorithms utilize conditional independence tests to infer parts of the causal graph. Our causal feature selection approach filters out causally-spurious links before passing the remaining causal features as inputs to ML models (Multiple linear regression, Random Forest) that predict the targets. We apply our framework to the statistical intensity prediction of Western Pacific Tropical Cyclones (TC), for which it is often difficult to accurately choose drivers and their dimensionality reduction (time lags, vertical levels, and area-averaging). Using more stringent significance thresholds in the conditional independence tests helps eliminate spurious causal relationships, thus helping the ML model generalize better to unseen TC cases. M-PC1 with a reduced number of features outperforms M-PCMCI, non-causal ML, and other feature selection methods (lagged correlation, random), even slightly outperforming feature selection based on eXplainable Artificial Intelligence. The optimal causal drivers obtained from our causal feature selection help improve our understanding of underlying relationships and suggest new potential drivers of TC intensification.  ( 3 min )
    The most likely common cause. (arXiv:2306.17557v1 [physics.data-an])
    The common cause principle for two random variables $A$ and $B$ is examined in the case of causal insufficiency, when their common cause $C$ is known to exist, but only the joint probability of $A$ and $B$ is observed. As a result, $C$ cannot be uniquely identified (the latent confounder problem). We show that the generalized maximum likelihood method can be applied to this situation and allows identification of $C$ that is consistent with the common cause principle. It closely relates to the maximum entropy principle. Investigation of the two binary symmetric variables reveals a non-analytic behavior of conditional probabilities reminiscent of a second-order phase transition. This occurs during the transition from correlation to anti-correlation in the observed probability distribution. The relation between the generalized likelihood approach and alternative methods, such as predictive likelihood and the minimum common cause entropy, is discussed. The consideration of the common cause for three observed variables (and one hidden cause) uncovers causal structures that defy representation through directed acyclic graphs with the Markov condition.  ( 2 min )
    Limits of Model Selection under Transfer Learning. (arXiv:2305.00152v3 [stat.ML] UPDATED)
    Theoretical studies on transfer learning or domain adaptation have so far focused on situations with a known hypothesis class or model; however in practice, some amount of model selection is usually involved, often appearing under the umbrella term of hyperparameter-tuning: for example, one may think of the problem of tuning for the right neural network architecture towards a target task, while leveraging data from a related source task. Now, in addition to the usual tradeoffs on approximation vs estimation errors involved in model selection, this problem brings in a new complexity term, namely, the transfer distance between source and target distributions, which is known to vary with the choice of hypothesis class. We present a first study of this problem, focusing on classification; in particular, the analysis reveals some remarkable phenomena: adaptive rates, i.e., those achievable with no distributional information, can be arbitrarily slower than oracle rates, i.e., when given knowledge on distances.
    Efficient uniform approximation using Random Vector Functional Link networks. (arXiv:2306.17501v1 [stat.ML])
    A Random Vector Functional Link (RVFL) network is a depth-2 neural network with random inner weights and biases. As only the outer weights of such architectures need to be learned, the learning process boils down to a linear optimization task, allowing one to sidestep the pitfalls of nonconvex optimization problems. In this paper, we prove that an RVFL with ReLU activation functions can approximate Lipschitz continuous functions provided its hidden layer is exponentially wide in the input dimension. Although it has been established before that such approximation can be achieved in $L_2$ sense, we prove it for $L_\infty$ approximation error and Gaussian inner weights. To the best of our knowledge, our result is the first of this kind. We give a nonasymptotic lower bound for the number of hidden layer nodes, depending on, among other things, the Lipschitz constant of the target function, the desired accuracy, and the input dimension. Our method of proof is rooted in probability theory and harmonic analysis.
    On the Existence of a Complexity in Fixed Budget Bandit Identification. (arXiv:2303.09468v2 [stat.ML] UPDATED)
    In fixed budget bandit identification, an algorithm sequentially observes samples from several distributions up to a given final time. It then answers a query about the set of distributions. A good algorithm will have a small probability of error. While that probability decreases exponentially with the final time, the best attainable rate is not known precisely for most identification tasks. We show that if a fixed budget task admits a complexity, defined as a lower bound on the probability of error which is attained by the same algorithm on all bandit problems, then that complexity is determined by the best non-adaptive sampling procedure for that problem. We show that there is no such complexity for several fixed budget identification tasks including Bernoulli best arm identification with two arms: there is no single algorithm that attains everywhere the best possible rate.
    Practical and Asymptotically Exact Conditional Sampling in Diffusion Models. (arXiv:2306.17775v1 [stat.ML])
    Diffusion models have been successful on a range of conditional generation tasks including molecular design and text-to-image generation. However, these achievements have primarily depended on task-specific conditional training or error-prone heuristic approximations. Ideally, a conditional generation method should provide exact samples for a broad range of conditional distributions without requiring task-specific training. To this end, we introduce the Twisted Diffusion Sampler, or TDS. TDS is a sequential Monte Carlo (SMC) algorithm that targets the conditional distributions of diffusion models. The main idea is to use twisting, an SMC technique that enjoys good computational efficiency, to incorporate heuristic approximations without compromising asymptotic exactness. We first find in simulation and on MNIST image inpainting and class-conditional generation tasks that TDS provides a computational statistical trade-off, yielding more accurate approximations with many particles but with empirical improvements over heuristics with as few as two particles. We then turn to motif-scaffolding, a core task in protein design, using a TDS extension to Riemannian diffusion models. On benchmark test cases, TDS allows flexible conditioning criteria and often outperforms the state of the art.
    Online Learning with Set-Valued Feedback. (arXiv:2306.06247v2 [cs.LG] UPDATED)
    We study a variant of online multiclass classification where the learner predicts a single label but receives a \textit{set of labels} as feedback. In this model, the learner is penalized for not outputting a label contained in the revealed set. We show that unlike online multiclass learning with single-label feedback, deterministic and randomized online learnability are \textit{not equivalent} even in the realizable setting with set-valued feedback. Accordingly, we give two new combinatorial dimensions, named the Set Littlestone and Measure Shattering dimension, that tightly characterize deterministic and randomized online learnability respectively in the realizable setting. In addition, we show that the Measure Shattering dimension tightly characterizes online learnability in the agnostic setting. Finally, we show that practical learning settings like online multilabel ranking, online multilabel classification, and online interval learning are specific instances of our general framework.
    A Quantitative Functional Central Limit Theorem for Shallow Neural Networks. (arXiv:2306.16932v1 [math.PR] CROSS LISTED)
    We prove a Quantitative Functional Central Limit Theorem for one-hidden-layer neural networks with generic activation function. The rates of convergence that we establish depend heavily on the smoothness of the activation function, and they range from logarithmic in non-differentiable cases such as the Relu to $\sqrt{n}$ for very regular activations. Our main tools are functional versions of the Stein-Malliavin approach; in particular, we exploit heavily a quantitative functional central limit theorem which has been recently established by Bourguin and Campese (2020).
    Generalized Time Warping Invariant Dictionary Learning for Time Series Classification and Clustering. (arXiv:2306.17690v1 [stat.ML])
    Dictionary learning is an effective tool for pattern recognition and classification of time series data. Among various dictionary learning techniques, the dynamic time warping (DTW) is commonly used for dealing with temporal delays, scaling, transformation, and many other kinds of temporal misalignments issues. However, the DTW suffers overfitting or information loss due to its discrete nature in aligning time series data. To address this issue, we propose a generalized time warping invariant dictionary learning algorithm in this paper. Our approach features a generalized time warping operator, which consists of linear combinations of continuous basis functions for facilitating continuous temporal warping. The integration of the proposed operator and the dictionary learning is formulated as an optimization problem, where the block coordinate descent method is employed to jointly optimize warping paths, dictionaries, and sparseness coefficients. The optimized results are then used as hyperspace distance measures to feed classification and clustering algorithms. The superiority of the proposed method in terms of dictionary learning, classification, and clustering is validated through ten sets of public datasets in comparing with various benchmark methods.
    Neural Characteristic Activation Value Analysis for Improved ReLU Network Feature Learning. (arXiv:2305.15912v2 [cs.LG] UPDATED)
    We examine the characteristic activation values of individual ReLU units in neural networks. We refer to the corresponding set for such characteristic activation values in the input space as the characteristic activation set of a ReLU unit. We draw an explicit connection between the characteristic activation set and learned features in ReLU networks. This connection leads to new insights into why various neural network normalization techniques used in modern deep learning architectures regularize and stabilize SGD optimization. Utilizing these insights, we propose a geometric approach to parameterize ReLU networks for improved feature learning. We empirically verify its usefulness with less carefully chosen initialization schemes and larger learning rates. We report improved optimization stability, faster convergence speed, and better generalization performance.  ( 2 min )
    Approximating Probability Distributions by using Wasserstein Generative Adversarial Networks. (arXiv:2103.10060v4 [cs.LG] UPDATED)
    Studied here are Wasserstein generative adversarial networks (WGANs) with GroupSort neural networks as their discriminators. It is shown that the error bound of the approximation for the target distribution depends on the width and depth (capacity) of the generators and discriminators and the number of samples in training. A quantified generalization bound is established for the Wasserstein distance between the generated and target distributions. According to the theoretical results, WGANs have a higher requirement for the capacity of discriminators than that of generators, which is consistent with some existing results. More importantly, the results with overly deep and wide (high-capacity) generators may be worse than those with low-capacity generators if discriminators are insufficiently strong. Numerical results obtained using Swiss roll and MNIST datasets confirm the theoretical results.
    Off-Policy Evaluation with Out-of-Sample Guarantees. (arXiv:2301.08649v3 [stat.ML] UPDATED)
    We consider the problem of evaluating the performance of a decision policy using past observational data. The outcome of a policy is measured in terms of a loss (aka. disutility or negative reward) and the main problem is making valid inferences about its out-of-sample loss when the past data was observed under a different and possibly unknown policy. Using a sample-splitting method, we show that it is possible to draw such inferences with finite-sample coverage guarantees about the entire loss distribution, rather than just its mean. Importantly, the method takes into account model misspecifications of the past policy - including unmeasured confounding. The evaluation method can be used to certify the performance of a policy using observational data under a specified range of credible model assumptions.
    High Dimensional Data Enrichment: Interpretable, Fast, and Data-Efficient. (arXiv:1806.04047v4 [stat.ML] UPDATED)
    We consider the problem of multi-task learning in the high dimensional setting. In particular, we introduce an estimator and investigate its statistical and computational properties for the problem of multiple connected linear regressions known as Data Enrichment/Sharing. The between-tasks connections are captured by a cross-tasks \emph{common parameter}, which gets refined by per-task \emph{individual parameters}. Any convex function, e.g., norm, can characterize the structure of both common and individual parameters. We delineate the sample complexity of our estimator and provide a high probability non-asymptotic bound for estimation error of all parameters under a geometric condition. We show that the recovery of the common parameter benefits from \emph{all} of the pooled samples. We propose an iterative estimation algorithm with a geometric convergence rate and supplement our theoretical analysis with experiments on synthetic data. Overall, we present a first thorough statistical and computational analysis of inference in the data-sharing model.  ( 2 min )
    Uncertainty Estimation by Fisher Information-based Evidential Deep Learning. (arXiv:2303.02045v3 [cs.LG] UPDATED)
    Uncertainty estimation is a key factor that makes deep learning reliable in practical applications. Recently proposed evidential neural networks explicitly account for different uncertainties by treating the network's outputs as evidence to parameterize the Dirichlet distribution, and achieve impressive performance in uncertainty estimation. However, for high data uncertainty samples but annotated with the one-hot label, the evidence-learning process for those mislabeled classes is over-penalized and remains hindered. To address this problem, we propose a novel method, Fisher Information-based Evidential Deep Learning ($\mathcal{I}$-EDL). In particular, we introduce Fisher Information Matrix (FIM) to measure the informativeness of evidence carried by each sample, according to which we can dynamically reweight the objective loss terms to make the network more focused on the representation learning of uncertain classes. The generalization ability of our network is further improved by optimizing the PAC-Bayesian bound. As demonstrated empirically, our proposed method consistently outperforms traditional EDL-related algorithms in multiple uncertainty estimation tasks, especially in the more challenging few-shot classification settings.  ( 2 min )
    First-order ANIL learns linear representations despite misspecified latent dimension. (arXiv:2303.01335v2 [cs.LG] UPDATED)
    Due to its empirical success in few-shot classification and reinforcement learning, meta-learning has recently received significant interest. Meta-learning methods leverage data from previous tasks to learn a new task in a sample-efficient manner. In particular, model-agnostic methods look for initialisation points from which gradient descent quickly adapts to any new task. Although it has been empirically suggested that such methods perform well by learning shared representations during pretraining, there is limited theoretical evidence of such behavior. More importantly, it has not been rigorously shown that these methods still learn a shared structure, despite architectural misspecifications. In this direction, this work shows, in the limit of an infinite number of tasks, that first-order ANIL with a linear two-layer network architecture successfully learns linear shared representations. This result even holds with a misspecified network parameterisation; having a width larger than the dimension of the shared representations results in an asymptotically low-rank solution. The learnt solution then yields a good adaptation performance on any new task after a single gradient step. Overall this illustrates how well model-agnostic methods such as first-order ANIL can learn shared representations.  ( 2 min )
    Sufficient-Statistic Memory AMP. (arXiv:2112.15327v4 [cs.IT] UPDATED)
    Approximate message passing (AMP) type algorithms have been widely used in the signal reconstruction of certain large random linear systems. A key feature of the AMP-type algorithms is that their dynamics can be correctly described by state evolution. While state evolution is a useful analytic tool, its convergence is not guaranteed. To solve the convergence problem of the state evolution of AMP-type algorithms in principle, this paper proposes a sufficient-statistic memory AMP (SS-MAMP) algorithm framework under the conditions of right-unitarily invariant sensing matrices, Lipschitz-continuous local processors and the sufficient-statistic constraint (i.e., the current message of each local processor is a sufficient statistic of the signal vector given the current and all preceding messages). We show that the covariance matrices of SS-MAMP are L-banded and convergent, which is an optimal framework (from the local MMSE/LMMSE perspective) for AMP-type algorithms given the Lipschitz-continuous local processors. Given an arbitrary MAMP, we can construct an SS-MAMP by damping, which not only ensures the convergence of the state evolution, but also preserves the orthogonality, i.e., its dynamics can be correctly described by state evolution. As a byproduct, we prove that the Bayes-optimal orthogonal/vector AMP (BO-OAMP/VAMP) is an SS-MAMP. As an example, we construct a sufficient-statistic Bayes-optimal MAMP (SS-BO-MAMP) whose state evolution converges to the minimum (i.e., Bayes-optimal) mean square error (MSE) predicted by replica methods when it has a unique fixed point. In addition, the MSE of SS-BO-MAMP is not worse than the original BO-MAMP. Finally, simulations are provided to support the theoretical results.  ( 3 min )
    GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP, and Beyond. (arXiv:2211.01962v4 [cs.LG] UPDATED)
    We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making, which includes Markov decision process (MDP), partially observable Markov decision process (POMDP), and predictive state representation (PSR) as special cases. Toward finding the minimum assumption that empowers sample efficient learning, we propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation in online interactive decision making. In specific, GEC captures the hardness of exploration by comparing the error of predicting the performance of the updated policy with the in-sample training error evaluated on the historical data. We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR, where generalized regular PSR, a new tractable PSR class identified by us, includes nearly all known tractable POMDPs and PSRs. Furthermore, in terms of algorithm design, we propose a generic posterior sampling algorithm, which can be implemented in both model-free and model-based fashion, under both fully observable and partially observable settings. The proposed algorithm modifies the standard posterior sampling algorithm in two aspects: (i) we use an optimistic prior distribution that biases towards hypotheses with higher values and (ii) a loglikelihood function is set to be the empirical loss evaluated on the historical data, where the choice of loss function supports both model-free and model-based learning. We prove that the proposed algorithm is sample efficient by establishing a sublinear regret upper bound in terms of GEC. In summary, we provide a new and unified understanding of both fully observable and partially observable RL.  ( 3 min )
    A method to integrate and classify normal distributions. (arXiv:2012.14331v9 [stat.ML] UPDATED)
    Univariate and multivariate normal probability distributions are widely used when modeling decisions under uncertainty. Computing the performance of such models requires integrating these distributions over specific domains, which can vary widely across models. Besides some special cases, there exist no general analytical expressions, standard numerical methods or software for these integrals. Here we present mathematical results and open-source software that provide (i) the probability in any domain of a normal in any dimensions with any parameters, (ii) the probability density, cumulative distribution, and inverse cumulative distribution of any function of a normal vector, (iii) the classification errors among any number of normal distributions, the Bayes-optimal discriminability index and relation to the operating characteristic, (iv) dimension reduction and visualizations for such problems, and (v) tests for how reliably these methods may be used on given data. We demonstrate these tools with vision research applications of detecting occluding objects in natural scenes, and detecting camouflage.  ( 3 min )
    Variational principle to regularize machine-learned density functionals: the non-interacting kinetic-energy functional. (arXiv:2306.17587v1 [physics.chem-ph])
    Practical density functional theory (DFT) owes its success to the groundbreaking work of Kohn and Sham that introduced the exact calculation of the non-interacting kinetic energy of the electrons using an auxiliary mean-field system. However, the full power of DFT will not be unleashed until the exact relationship between the electron density and the non-interacting kinetic energy is found. Various attempts have been made to approximate this functional, similar to the exchange--correlation functional, with much less success due to the larger contribution of kinetic energy and its more non-local nature. In this work we propose a new and efficient regularization method to train density functionals based on deep neural networks, with particular interest in the kinetic-energy functional. The method is tested on (effectively) one-dimensional systems, including the hydrogen chain, non-interacting electrons, and atoms of the first two periods, with excellent results. For the atomic systems, the generalizability of the regularization method is demonstrated by training also an exchange--correlation functional, and the contrasting nature of the two functionals is discussed from a machine-learning perspective.  ( 2 min )
    Heterogeneous Distributed Lag Models to Estimate Personalized Effects of Maternal Exposures to Air Pollution. (arXiv:2109.13763v3 [stat.ME] UPDATED)
    Children's health studies support an association between maternal environmental exposures and children's birth outcomes. A common goal is to identify critical windows of susceptibility--periods during gestation with increased association between maternal exposures and a future outcome. The timing of the critical windows and magnitude of the associations are likely heterogeneous across different levels of individual, family, and neighborhood characteristics. Using an administrative Colorado birth cohort we estimate the individualized relationship between weekly exposures to fine particulate matter (PM$_{2.5}$) during gestation and birth weight. To achieve this goal, we propose a statistical learning method combining distributed lag models and Bayesian additive regression trees to estimate critical windows at the individual level and identify characteristics that induce heterogeneity from a high-dimensional set of potential modifying factors. We find evidence of heterogeneity in the PM$_{2.5}$-birth weight relationship, with some mother-child dyads showing a 3 times larger decrease in birth weight for an IQR increase in exposure (5.9 to 8.5 $\mu g/m^3$ PM$_{2.5}$) compared to the population average. Specifically, we find increased susceptibility for non-Hispanic mothers who are either younger, have higher body mass index or lower educational attainment. Our case study is the first precision health study of critical windows.  ( 3 min )
    Scalable method for Bayesian experimental design without integrating over posterior distribution. (arXiv:2306.17615v1 [math.NA])
    We address the computational efficiency in solving the A-optimal Bayesian design of experiments problems for which the observational model is based on partial differential equations and, consequently, is computationally expensive to evaluate. A-optimality is a widely used and easy-to-interpret criterion for the Bayesian design of experiments. The criterion seeks the optimal experiment design by minimizing the expected conditional variance, also known as the expected posterior variance. This work presents a novel likelihood-free method for seeking the A-optimal design of experiments without sampling or integrating the Bayesian posterior distribution. In our approach, the expected conditional variance is obtained via the variance of the conditional expectation using the law of total variance, while we take advantage of the orthogonal projection property to approximate the conditional expectation. Through an asymptotic error estimation, we show that the intractability of the posterior does not affect the performance of our approach. We use an artificial neural network (ANN) to approximate the nonlinear conditional expectation to implement our method. For dealing with continuous experimental design parameters, we integrate the training process of the ANN into minimizing the expected conditional variance. Specifically, we propose a non-local approximation of the conditional expectation and apply transfer learning to reduce the number of evaluations of the observation model. Through numerical experiments, we demonstrate that our method significantly reduces the number of observational model evaluations compared with common importance sampling-based approaches. This reduction is crucial, considering the computationally expensive nature of these models.  ( 3 min )
    The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit. (arXiv:2306.17759v1 [stat.ML])
    In deep learning theory, the covariance matrix of the representations serves as a proxy to examine the network's trainability. Motivated by the success of Transformers, we study the covariance matrix of a modified Softmax-based attention model with skip connections in the proportional limit of infinite-depth-and-width. We show that at initialization the limiting distribution can be described by a stochastic differential equation (SDE) indexed by the depth-to-width ratio. To achieve a well-defined stochastic limit, the Transformer's attention mechanism is modified by centering the Softmax output at identity, and scaling the Softmax logits by a width-dependent temperature parameter. We examine the stability of the network through the corresponding SDE, showing how the scale of both the drift and diffusion can be elegantly controlled with the aid of residual connections. The existence of a stable SDE implies that the covariance structure is well-behaved, even for very large depth and width, thus preventing the notorious issues of rank degeneracy in deep attention models. Finally, we show, through simulations, that the SDE provides a surprisingly good description of the corresponding finite-size model. We coin the name shaped Transformer for these architectural modifications.  ( 2 min )
    A Bayesian Filtering Algorithm for Gaussian Mixture Models. (arXiv:1705.05495v2 [stat.ML] UPDATED)
    A Bayesian filtering algorithm is developed for a class of state-space systems that can be modelled via Gaussian mixtures. In general, the exact solution to this filtering problem involves an exponential growth in the number of mixture terms and this is handled here by utilising a Gaussian mixture reduction step after both the time and measurement updates. In addition, a square-root implementation of the unified algorithm is presented and this algorithm is profiled on several simulated systems. This includes the state estimation for two non-linear systems that are strictly outside the class considered in this paper.  ( 2 min )
    Private Online Prediction from Experts: Separations and Faster Rates. (arXiv:2210.13537v3 [cs.LG] UPDATED)
    Online prediction from experts is a fundamental problem in machine learning and several works have studied this problem under privacy constraints. We propose and analyze new algorithms for this problem that improve over the regret bounds of the best existing algorithms for non-adaptive adversaries. For approximate differential privacy, our algorithms achieve regret bounds of $\tilde{O}(\sqrt{T \log d} + \log d/\varepsilon)$ for the stochastic setting and $\tilde{O}(\sqrt{T \log d} + T^{1/3} \log d/\varepsilon)$ for oblivious adversaries (where $d$ is the number of experts). For pure DP, our algorithms are the first to obtain sub-linear regret for oblivious adversaries in the high-dimensional regime $d \ge T$. Moreover, we prove new lower bounds for adaptive adversaries. Our results imply that unlike the non-private setting, there is a strong separation between the optimal regret for adaptive and non-adaptive adversaries for this problem. Our lower bounds also show a separation between pure and approximate differential privacy for adaptive adversaries where the latter is necessary to achieve the non-private $O(\sqrt{T})$ regret.  ( 2 min )
    Case-Base Neural Networks: survival analysis with time-varying, higher-order interactions. (arXiv:2301.06535v2 [stat.ML] UPDATED)
    Neural network-based survival methods can model data-driven covariate interactions. While these methods can provide better predictive performance than regression-based approaches, not all can model time-varying interactions and complex baseline hazards. To address this, we propose Case-Base Neural Networks (CBNNs) as a new approach that combines the case-base sampling framework with flexible neural network architectures. Using a novel sampling scheme and data augmentation to naturally account for censoring, we construct a feed-forward neural network that may take time as an input. CBNNs predict the probability of an event occurring at a given moment to estimate the hazard function. We compare the performance of CBNNs to regression and neural network-based survival methods in a simulation and three case studies using two time-dependent metrics. First, we examine performance on a simulation involving a complex baseline hazard and time-varying interactions to assess all methods, with CBNN outperforming competitors. Then, we apply all methods to three real data applications, with CBNNs outperforming the competing models in two studies and showing similar performance in the third. Our results highlight the benefit of combining case-base sampling with deep learning to provide a simple and flexible modeling framework for data-driven, time-varying interaction modeling of single event survival outcomes. An R package is available at https://github.com/Jesse-Islam/cbnn.  ( 2 min )
    Causal Rule Ensemble: Interpretable Discovery and Inference of Heterogeneous Treatment Effects. (arXiv:2009.09036v5 [stat.ME] UPDATED)
    In health and social sciences, it is critically important to identify subgroups of the study population where there is notable heterogeneity of treatment effects (HTE) with respect to the population average. Decision trees have been proposed and commonly adopted for data-driven discovery of HTE due to their high level of interpretability. However, single-tree discovery of HTE can be unstable and oversimplified. This paper introduces Causal Rule Ensemble (CRE), a new method for HTE discovery and estimation through an ensemble-of-trees approach. CRE offers several key features, including 1) an interpretable representation of the HTE; 2) the ability to explore complex heterogeneity patterns; and 3) high stability in subgroups discovery. The discovered subgroups are defined in terms of interpretable decision rules. Estimation of subgroup-specific causal effects is performed via a two-stage approach for which we provide theoretical guarantees. Via simulations, we show that the CRE method is highly competitive when compared to state-of-the-art techniques. Finally, we apply CRE to discover the heterogeneous health effects of exposure to air pollution on mortality for 35.3 million Medicare beneficiaries across the contiguous U.S.  ( 2 min )
    Global Optimality in Bivariate Gradient-based DAG Learning. (arXiv:2306.17378v1 [cs.LG])
    Recently, a new class of non-convex optimization problems motivated by the statistical problem of learning an acyclic directed graphical model from data has attracted significant interest. While existing work uses standard first-order optimization schemes to solve this problem, proving the global optimality of such approaches has proven elusive. The difficulty lies in the fact that unlike other non-convex problems in the literature, this problem is not "benign", and possesses multiple spurious solutions that standard approaches can easily get trapped in. In this paper, we prove that a simple path-following optimization scheme globally converges to the global minimum of the population loss in the bivariate setting.  ( 2 min )
    Capturing functional connectomics using Riemannian partial least squares. (arXiv:2306.17371v1 [stat.ME])
    For neurological disorders and diseases, functional and anatomical connectomes of the human brain can be used to better inform targeted interventions and treatment strategies. Functional magnetic resonance imaging (fMRI) is a non-invasive neuroimaging technique that captures spatio-temporal brain function through blood flow over time. FMRI can be used to study the functional connectome through the functional connectivity matrix; that is, Pearson's correlation matrix between time series from the regions of interest of an fMRI image. One approach to analysing functional connectivity is using partial least squares (PLS), a multivariate regression technique designed for high-dimensional predictor data. However, analysing functional connectivity with PLS ignores a key property of the functional connectivity matrix; namely, these matrices are positive definite. To account for this, we introduce a generalisation of PLS to Riemannian manifolds, called R-PLS, and apply it to symmetric positive definite matrices with the affine invariant geometry. We apply R-PLS to two functional imaging datasets: COBRE, which investigates functional differences between schizophrenic patients and healthy controls, and; ABIDE, which compares people with autism spectrum disorder and neurotypical controls. Using the variable importance in the projection statistic on the results of R-PLS, we identify key functional connections in each dataset that are well represented in the literature. Given the generality of R-PLS, this method has potential to open up new avenues for multi-model imaging analysis linking structural and functional connectomics.  ( 2 min )
    iSCAN: Identifying Causal Mechanism Shifts among Nonlinear Additive Noise Models. (arXiv:2306.17361v1 [cs.LG])
    Structural causal models (SCMs) are widely used in various disciplines to represent causal relationships among variables in complex systems. Unfortunately, the true underlying directed acyclic graph (DAG) structure is often unknown, and determining it from observational or interventional data remains a challenging task. However, in many situations, the end goal is to identify changes (shifts) in causal mechanisms between related SCMs rather than recovering the entire underlying DAG structure. Examples include analyzing gene regulatory network structure changes between healthy and cancerous individuals or understanding variations in biological pathways under different cellular contexts. This paper focuses on identifying $\textit{functional}$ mechanism shifts in two or more related SCMs over the same set of variables -- $\textit{without estimating the entire DAG structure of each SCM}$. Prior work under this setting assumed linear models with Gaussian noises; instead, in this work we assume that each SCM belongs to the more general class of nonlinear additive noise models (ANMs). A key contribution of this work is to show that the Jacobian of the score function for the $\textit{mixture distribution}$ allows for identification of shifts in general non-parametric functional mechanisms. Once the shifted variables are identified, we leverage recent work to estimate the structural differences, if any, for the shifted variables. Experiments on synthetic and real-world data are provided to showcase the applicability of this approach.  ( 3 min )
    An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue Optimization Problems. (arXiv:2306.17470v1 [math.OC])
    In this work, we revisit the problem of solving large-scale semidefinite programs using randomized first-order methods and stochastic smoothing. We introduce two oblivious stochastic mirror descent algorithms based on a complementary composite setting. One algorithm is designed for non-smooth objectives, while an accelerated version is tailored for smooth objectives. Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function. For the non-smooth case with $\mathcal{M}-$bounded oracles, we prove a convergence rate of $ O( {\mathcal{M}}/{\sqrt{T}} ) $. For the $L$-smooth case with a feasible set bounded by $D$, we derive a convergence rate of $ O( {L^2 D^2}/{(T^{2}\sqrt{T})} + {(D_0^2+\sigma^2)}/{\sqrt{T}} )$, where $D_0$ is the starting distance to an optimal solution, and $ \sigma^2$ is the stochastic oracle variance. These rates had only been obtained so far by either assuming prior knowledge of the Lipschitz constant or the starting distance to an optimal solution. We further show how to extend our framework to relative scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.  ( 2 min )
    Why Shallow Networks Struggle with Approximating and Learning High Frequency: A Numerical Study. (arXiv:2306.17301v1 [cs.LG])
    In this work, a comprehensive numerical study involving analysis and experiments shows why a two-layer neural network has difficulties handling high frequencies in approximation and learning when machine precision and computation cost are important factors in real practice. In particular, the following fundamental computational issues are investigated: (1) the best accuracy one can achieve given a finite machine precision, (2) the computation cost to achieve a given accuracy, and (3) stability with respect to perturbations. The key to the study is the spectral analysis of the corresponding Gram matrix of the activation functions which also shows how the properties of the activation function play a role in the picture.  ( 2 min )
    Kernel $\epsilon$-Greedy for Contextual Bandits. (arXiv:2306.17329v1 [stat.ML])
    We consider a kernelized version of the $\epsilon$-greedy strategy for contextual bandits. More precisely, in a setting with finitely many arms, we consider that the mean reward functions lie in a reproducing kernel Hilbert space (RKHS). We propose an online weighted kernel ridge regression estimator for the reward functions. Under some conditions on the exploration probability sequence, $\{\epsilon_t\}_t$, and choice of the regularization parameter, $\{\lambda_t\}_t$, we show that the proposed estimator is consistent. We also show that for any choice of kernel and the corresponding RKHS, we achieve a sub-linear regret rate depending on the intrinsic dimensionality of the RKHS. Furthermore, we achieve the optimal regret rate of $\sqrt{T}$ under a margin condition for finite-dimensional RKHS.  ( 2 min )
  • Open

    Best Books to Learn Neural Networks in 2023 for Beginners (Updated) -
    submitted by /u/Lakshmireddys [link] [comments]  ( 8 min )

  • Open

    What ai voice generator did they use for this video?
    submitted by /u/spicywasabi [link] [comments]  ( 8 min )
    Voice Modulation
    I am really interested in self hosting my own voice modulation. Particularly impersonating cartoon characters. How can someone develop and train one of these models? submitted by /u/jesus-da-wizard [link] [comments]  ( 8 min )
    Does ChatGPT Keep Our Data?
    Do you think ChatGPT Keep Our Data? submitted by /u/Imagine-your-success [link] [comments]  ( 8 min )
    How to Fix AutoGPT and Build a Proto-AGI
    submitted by /u/lorepieri [link] [comments]  ( 8 min )
    One-Minute Daily AI News 7/2/2023
    Moody’s Corp. is using Microsoft Corp. and OpenAI to create an artificial intelligence assistant that will help customers of the credit rating and research firm analyze reams of information needed to make assessments of risk. “Moody’s Research Assistant” will roll out to customers including analysts, bankers, advisers, researchers, and investors.[1] Unity announces the release of Muse: A Text-to-Video Games Platform that lets you create textures, sprites, and animations with natural languages.[2] The New York State Legislature passed a number of bills this session, including one that would ban “deepfake” images online. Deepfakes are images or videos that have been manipulated to make it appear as if someone is saying or doing something they never said or did. The bill would make it illegal to create or distribute deepfakes that are used to harm or humiliate someone.[3] As per Times Now Report, Reece Wiench, 23, and Deyton Truitt, 26, decided to break away from tradition by holding a unique wedding ceremony. Instead of a physical human officiant, the couple opted for a machine featuring ChatGPT. The machine, adorned with a mask resembling the famous C-3PO from Star Wars, took center stage.[4] Sources: [1] https://www.bloomberg.com/news/articles/2023-06-29/moody-s-will-use-microsoft-openai-for-research-tool-to-assess-risk [2] https://www.marktechpost.com/2023/06/30/unity-announce-the-release-of-muse-a-text-to-video-games-platform-that-lets-you-create-textures-sprites-and-animations-with-natural-language/ [3] https://www.newsday.com/news/region-state/state-legislature-deepfakes-artificial-intelligence-liquor-store-hours-kathy-hochul-ks257o8t [4] https://odishatv.in/news/technology/chatgpt-officiates-wedding-couple-employs-ai-as-host-for-unusual-ceremony-208457 submitted by /u/Excellent-Target-847 [link] [comments]  ( 8 min )
    AI to create songs from my lyrics like Songr
    Are there any other AI apps or sites like Songr, that basically creates and sings a song out of the lyrics you give it? I quite enjoy this one, and I want to know if there are any other. submitted by /u/Would-Be-Superhero [link] [comments]  ( 8 min )
    What is the best way to clone a voice theses days
    Just wondering what GitHub repo or some open source option exists to create a voice model from and to use in a text to speech engine ? submitted by /u/suprem_lux [link] [comments]  ( 8 min )
    Recommendations for generative AI that makes diagrams and simple animations of schemes?
    I'm looking for a generative AI for either diagrams, ideally animated. So far Google hasn't helped me found anything great. submitted by /u/fjanko [link] [comments]  ( 8 min )
    Introducing Interview AI - Get 100% job interview ready with AI
    Hey there, amazing r/artifical community! I'm thrilled to introduce you to Interview AI, the ultimate companion for interview preparation. Whether you're aiming for your dream job or want to level up your interview skills, Interview-AI is here to support you every step of the way. What is Interview-AI? Interview-AI is your round-the-clock interview coach designed to help you excel in job interviews. We understand the importance of targeted and personalized interview practice, and that's why we've created a platform that offers tailored questions and detailed feedback, just for you. ps- We have just launched Interview AI on Product Hunt, would love if you can check it out the product, give us some feedback, and (of course) would really appreciate your support 😄🙏. https://www.producthunt.com/posts/interview-ai submitted by /u/data-leon [link] [comments]  ( 8 min )
    Can you help me create a Home companion? Ideas, Suggestions welcome.
    Setting the post to NSFW because of the mention of sexdoll ai My situation is unique in some ways, Please read to the end before offering any suggestions or help. I live my life in my house, I probably leave my home 2-4 times a year, I am classed as severely disabled, mostly mental issues from Autism, ADHD, Bipolar, OCD and PTSD, I am also in remission from bowel cancer. For 20 years I used technology to help look after my Wife, when she died technology was the only thing that kept me in this world, group therapy didn't work, step programs didn't work, 5 different therapist, clinical psychologists, medication and even long stay hospital didn't work. But technology did, ever since the first gaming console in 1976 (A Binatone Master system) and the first hand held in 1980 (galaxy invader…  ( 10 min )
    It's the summer of love, baby !
    submitted by /u/Akumetsu_971 [link] [comments]  ( 8 min )
    Does Open AI and the likes pay the creators at all for the huge amount of data that they use to train their language models?
    Just wondering, since for private persons, copyright is very strictly enforced submitted by /u/Aesthetik_1 [link] [comments]  ( 8 min )
    Is there a pro active AI or expert system or companion system?
    I am looking for a system that will auto initiate contact with a user, not wait for a prompt, that can activate routines, make suggestions and generally take actions within a framework without the need of a prompt. This is for a companion AI that will help me deal with isolation due to disabilities. It will be setup in my house and be more a friend/companion than just an assistant. I want something that will get to know my likes or dislikes, suggest a new game to try or plan out a movie night for me, to remind me on and querie me on whether or not I took my tablets, a mother figure as it were. This is all possible with a scripted system and if you could add in something like a ChatGPT feed it would be even better. Any ideas of a good framework ? Hi all after the amazing responses I'll be posting a new topic which I hope you all join me in. (9) Reddit - Dive into anything it is only marked as NSFW because the ai companion is based on one that is used as a S doll AI cloud ​ submitted by /u/Quebber [link] [comments]  ( 8 min )
    The Overlooked Power of AI: Why Are People Underestimating its Potential?
    I recently had an eye-opening experience with AI that left me both amazed and somewhat unnerved. It made me ponder why so many people seem to vastly underestimate the true capabilities of artificial intelligence. So, I wanted to share my story and engage in a discussion about this phenomenon. I decided to set up my own AI assistant using Jarvis-like voice commands. Armed with Auto-GPT and connected to a REST API, this AI proved to be quite extraordinary. As a first test, I requested it to create an express, node.js web app for me. To my astonishment, it delved into extensive research on Express, generated code, saved files, debugged them in real-time, and even ran the app on a localhost server for me to view. It was more than just a chatbot; it was a practical problem-solving companion. …  ( 10 min )
    where do people make ai advertisement videos?
    these days im often seeing advertisment videos online that are made using ai. can anyone tell me what this ai is and how can i use it? submitted by /u/idontbelievestuff1 [link] [comments]  ( 8 min )
    Gnothi: open-source AI journal. Insights & resources include GPT-powered prompt -which uses entry history as context - behavior analysis, book recommendations, entry summaries, and recurring themes.
    Website, Github. My partner u/mVadr and I just launched Gnothi (v1). We’re both psychology/personal development enthusiasts who work in tech. We started Gnothi to get us journaling more, and to leverage machine learning to optimize the process. Overview of features: * Journal free. Unlimited entries. We created this to use in our daily lives. Helped our mental health and having clear direction in life. * Prompt premium. In the GPT era, we all know prompt. What's different is Gnothi uses your prior entries (based on filters) as background context for the conversation, so you don't need to set the stage with GPT. You can just ask what you want to ask. I recommend trying this on a dream. Record a dream, run the dream interpretation prompt preset, and be amazed. Dream interpretation was an ah…  ( 10 min )
  • Open

    PPO is off-policy in RLHF(LLM)?
    Why people say that PPO is off-policy in RLHF (language model)? This doesn't make any sense to me. PPO is on-policy algo. The reward model is trained using the pre-collected data. But regardless of this, the reward model is tight to the idea of off-policy right? Why some of the people make the claim that LLM training with RLHF and PPO, you are doing PPO in the off-policy fashion? ​ submitted by /u/No_Oilve_6577 [link] [comments]  ( 8 min )
    Normalized Observation that has a shifted Normal Distribution with a bar at 0
    I want to normalized my observations for my reinforcement learning agent, but the observation that I am trying to normalize has a distribution that is like a normal distribution with a mean that is shifted up, and with values at 0. It's something like 50% of the time the observation is 0 and for other 50 times, it's from a normal distribution with mean at 50. How should I normalize this type of observation? Thanks submitted by /u/LeSUTHU [link] [comments]  ( 8 min )
    EvoTorch using custom environment
    Hello experts, Does anybody have any experience to use a custom environment, not using gym, and evaluate using evotorch library? So far all examples in their documents are based on GymNE which is a built-in function. But my environment is totally hand-made. I can't seem to find anything on github either. submitted by /u/vincent0110 [link] [comments]  ( 8 min )
  • Open

    [R] Automated CPU Design With AI
    submitted by /u/EducationalCicada [link] [comments]  ( 8 min )
    [Discussion] What opportunities exist for companies to develop domain-specific LLM models?
    I was wondering if there is a need for LLM models that are for a specific purpose. For example, I am imagining an LLM model that is specialized for generating code that would be better at generating code than a general purpose LLM. Or is there no need for this because of the big general purpose LLMs that are trending? submitted by /u/BornAgain20Fifteen [link] [comments]  ( 8 min )
    [D] Validation accuracy greater than training accuracy
    I'm new to deep learning domain and I justed build a model to classify 9 classes. But somehow i get the validation accuracy 0.89 and training accuracy of 0.7 even though i use the same data. this is the link of my notebook : https://www.kaggle.com/code/ayoubsarab/mammals-classification-high-accuracy submitted by /u/Ordinary_Run_2513 [link] [comments]  ( 8 min )
    [P] P40s not working?
    Hi, Im trying to get 2x P40s working on a Asus Z270-p. Ive got 4g decoding on, tried setting everything to gen2. I removed all the drives and m.2s. But it still wont post. If i remove either of the 2 cards it posts just fine and boots. But with 2 it refuses to do anything. Any ideas? submitted by /u/ISwearImNotAnAI [link] [comments]  ( 8 min )
    [R] A visual critical review of most of the time series anomaly detection (TSAD) datasets
    submitted by /u/eamonnkeogh [link] [comments]  ( 8 min )
    [P] Source Code - Executable Dataset
    I need to make a large dataset of source code - executable pairs. Webscraping executables is easy, but finding matching source code is not. I'm thinking of generating the source code using ML then compiling it. Which models would best fit this task? I don't have a specific max file size yet, but a model which can make bigger programs in many languages with a variety a libraries is desirable. I'm currently looking at StarCoder. submitted by /u/boringdude123 [link] [comments]  ( 8 min )
    [D] Can I use faiss as retrieve on recommendation system?
    I'm planning on using faiss to generate candidates and then lightgbm to rank the candidates. I think about using E-commerce data, as most of the examples I see using faiss are textual data. Is using faiss for this a good approach? submitted by /u/Mota287 [link] [comments]  ( 8 min )
    [R] Adversarial Robust Deep Reinforcement Learning Requires Redefining Robustness
    https://ojs.aaai.org/index.php/AAAI/article/view/26009/25781 submitted by /u/ml_dnn [link] [comments]  ( 8 min )
    [D] ML Engineer vs. MLOps Engineer
    As the need for ML infra grows, the confusion between the "ML Engineer" vs. "MLOps Engineer" roles seems to be steadily increasing, and with it the number of online articles on the subject (quite a lot here, including our own recent blog too). Do we think this is something that will get clearer as the industry matures? Or is it more of a "giff" vs. "jiff" type situation? Or will this not matter once the "Prompt Engineers" wipe us all out? ;) https://preview.redd.it/35kxge9fyk9b1.png?width=500&format=png&auto=webp&s=b2fb0394ddc02b18be27bb50e2c59a4a18cb7a83 submitted by /u/kazhdan_d [link] [comments]  ( 8 min )
    [D] Latest techniques on reducing the learning rate sensitivity?
    I've observed sensitivity to the LR scheduling in the transformer architecture I've been training on some sequence generation task. I'm using Adam optimiser. Changing LR scheduling from linear decay to cubic decay seems to have made it much worse. I've warmup steps. Now I'm worrying if even linear decay is much worse than something else that I haven't tested. The loss curve is well behaved in both the cases but the accuracy of one is much better than the other. Is there a robust way to schedule the learning rate in transformers? submitted by /u/Professor_Entropy [link] [comments]  ( 8 min )
    [D] Simple Questions Thread
    Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead! Thread will stay alive until next one so keep posting after the date in the title. Thanks to everyone for answering questions in the previous thread! submitted by /u/AutoModerator [link] [comments]  ( 8 min )
    [D] The road to Machine Learning/Deep Learning Research for a high schooler
    Hello r/MachineLearning I am a high schooler right now and I have had some previous experience with ML and DL. I recently completed fast.ai's course "Practical Deep Learning for Coders Part 1". Currently, I am taking MITOCW's Linear Algebra course and will take a multivariable calculus course when I am finished with that, as I understand how important math is to ML/DL. Pytorch is also something I will take as well. My ultimate goal is to be able to conduct research with ML/DL. Specifically, to research the models themselves and to create new models. That is why I am taking such a math-oriented approach to learning ML/DL. My main question really is what is the path I should take to get to academic research with ML/DL? The math and the frameworks I consider to be the pre-requisites, and after these steps, I am unsure of how to break into the academic space. Any tips or if someone could speak from experience would be greatly appreciated! I am super excited about academic research with ML and I hope I will get there one day! Thanks! submitted by /u/Mr_Funkedeli [link] [comments]  ( 9 min )
    [D] ELI5: Why is the GPT family of models based on the decoder-only architecture?
    I recently dove into the nitty-gritty of these models for the first time. The transformer architecture was intriguing on its own but learning about its variations is fascinating. I am curious to understand the reason behind the choice of GPT models' architecture if that's possible to explain coherently here. I understand particular architectures are better suited for certain tasks but the GPT and similar LLMs are doing really well across a variety of tasks. If scale has something to do with it, does decoder-only architecture scale better? submitted by /u/analyticalmonk [link] [comments]  ( 8 min )
    [Discussion] Is this simplified explanation of the process of noise in Stable Diffusion true?
    How does text-to-image work? It's like teaching an artist about our visual world -- object definitions, shapes, dimensions, etc., and how they correspond to the person who commissioned the art (text prompts). The artist then watches a mosaic - say of an ice cream - being inserted by hundreds of tesserae (rectangular slabs used to create a mosaic) and then removed to restore the original mosaic. During this, the artist learns how to understand, recreate, and reinterpret the ‘ice cream’ image in other mosaics. The artist goes through this with millions of other depictions in mosaics (objects, locations, etc.) so they can create entirely new mosaics based on the requests (or text prompts) of the person commissioning them. Sampling steps are like commissioning an artist to interpret and construct a mosaic quickly or carefully. The more detail or accuracy you want, the more work and time have to go into it. ​ ​ submitted by /u/Neat-Bend-1190 [link] [comments]  ( 8 min )
    [Discussion] Resume/CV/Human Resources Recommender (System) Questions
    My team was given a little project for Stochastic Processes and Applications class this semester. The problem is building a suggestion system in the field of human resources: given a recruitment information and N job applications, suggest K job applications to the employer. Though it is an SPA class project but we don't have to use some SPA-related approach, and we are allowed to applied exist research paper, like understand the paper, do the coding and develop it if possible. When do some searching on the Internet, I find out that this kind of problem has not been researched as much as the job recommender problem. So I am having some questions that need answers: Can I applied the solution from job recommender problem research paper to solve my problem? Is there any differences between dataset applied to these problems? Can you suggest me some state of the art research paper about this? Thank you all. submitted by /u/moss_beevcl1 [link] [comments]  ( 8 min )
    [Discussion] Is it possible to get a validation error less than the training error for a classification task where the error is the fraction of misclassification?
    In a classification task using Neural Network, I computed the fraction of misclassification as an error. And I am getting a validation error less than the training error? NOTE: the output layer is linear because: loss = BinaryCrossentropy(from_logits=True) Project link: https://github.com/dipenpandit/diabetes-prediction ​ model architecture training models and computing error comparing errors submitted by /u/Dipen1DP [link] [comments]  ( 8 min )
    [D] where to start
    I have recently developed this interest for machine learning and the way things learn but at the same time I have no idea where to start from since most of the online tutorials are confusing. Any recommendations where to start ? (Yes I have knowledge about python,calculus and stats) submitted by /u/PrestigiousCanary435 [link] [comments]  ( 8 min )
    Looking to build a $200K GPU server to serve API requests for Stable Diffusion, Segment Anything and TTS models. What's my best option? [D]
    Theoretically we'll be serving a large number of users and I am worried about clogging entire GPUs for example on a Stable Diffusion request that would take Nvidia A100 about 20 seconds to fulfill. Do I need as many GPUs as my concurrent users? Is there any relevant guide on GPU virtualizations? submitted by /u/Accretence [link] [comments]  ( 8 min )
    [R] Pretrained Self-Driving Car Models
    Hi everyone, I am currently working on my master's thesis on self-driving cars. I am interested in using a model structure that has both LSTM and CNN in it. I have looked everywhere to find a model to download, but to no avail. I have looked at a lot of papers, but not a lot of them published their model. Do you know of any recent self-driving car models that have this kind of structure and are also pretrained and freely available? I can't afford to train it from scratch due to time limitations and my expertise as an electrical engineer is optimizing the model for a specific embedded platform. I would be grateful for any suggestions. Thanks, submitted by /u/BiscuitChivalry [link] [comments]  ( 8 min )
  • Open

    100 Snakes in AI Battle Royale using C++ and SFML. Source code is in the description.
    submitted by /u/Kofybrek [link] [comments]  ( 8 min )
  • Open

    Bounds on the incomplete beta function (beta CDF)
    The incomplete beta function is defined by It is called incomplete because the limit of integration doesn’t necessarily run all the way up to 1. When z = 1 the incomplete beta function is simply the beta function discussed in the previous post [1]. The incomplete beta function is proportional to the CDF of a […] Bounds on the incomplete beta function (beta CDF) first appeared on John D. Cook.  ( 5 min )
    Upper and lower bounds on the beta function
    The beta function B(x, y) is defined by and is the normalizing constant for the beta probability distribution. It is related to the gamma function via The beta function comes up often in applications. It can be challenging to work with, however, and so estimates for the function are welcome. The function 1/xy gives simple […] Upper and lower bounds on the beta function first appeared on John D. Cook.  ( 5 min )
    nth root of n!
    Last week the function came up in a blog post. as the solution to the equation n! = bn. After writing that post I stumbled upon an article [1] that begins by saying, in my notation, The function b(x) … has many applications in pure and applied mathematics. The article mentions a couple applications but mostly focuses […] nth root of n! first appeared on John D. Cook.  ( 5 min )

  • Open

    ollewelin Nerual_Netwok_CPP
    submitted by /u/keghn [link] [comments]  ( 8 min )
    Math Behind Neural Networks
    Can someone explain to me the math behind neural networks, and what makes them "intelligent"? I've watched several youtube series yet can't understand it. submitted by /u/elaraxelara [link] [comments]  ( 8 min )
    The U-Net Model
    submitted by /u/keghn [link] [comments]  ( 8 min )
  • Open

    [D] Has Meta's models been removed the from Huggingface hub?
    I see this, when I got to https://huggingface.co/facebook https://preview.redd.it/s9sce1ocof9b1.png?width=933&format=png&auto=webp&s=130f07681c583ad001323bc5b36007e7668d70c1 submitted by /u/a3ahmad [link] [comments]  ( 8 min )
    Neural Network Math [D]
    Can someone explain me the math behind neural networks? I've watched several videos on youtube and I still can't quite understand it. How do neural networks work, and at what point do they become "intelligent"? submitted by /u/elaraxelara [link] [comments]  ( 8 min )
    [N] Llama based open source model claims to beat ChatGPT 3.5
    Link: https://huggingface.co/openchat/openchat Not only that, they do it with only 6k conversations, i.e LIMA However evaluation does not looks very through, so call me a skeptic submitted by /u/kar_bura_ho_bhala [link] [comments]  ( 8 min )
    [N] SDXL 0.9 Released – SDXL 1.0 Release Date – How to Access SD XL Right Now? - Code Jana
    submitted by /u/Tom-Miller [link] [comments]  ( 8 min )
    [Discussion] In DQN how does the Q network not converge to the incorrect target ?
    Whenever you are doing reinforcement learning you periodically update the target network based on the weights of the Q network. While I do understand this helps create a stable target I do not understand how it helps the Q networks accuracy. If I am updating the Q network much more frequently than I am the target network then wont the Q network converge to and estimate of the target network that is inaccurate since the target network is being updated much slower ? submitted by /u/Ornery_Raccoon1487 [link] [comments]  ( 8 min )
    [R] Highlights for every ICML 2023 paper
    Here is the list of all ~1,600 ICML 2023 (International Conference on Machine Learning) papers and a highlight for each of them. ICML 2023 will take place from July 23 at Hawaii. https://www.paperdigest.org/2023/06/icml-2023-highlights/ In addition, here is the link of "search within venue service" that can be used to find papers within ICML-2023 related to a specific topic, e.g. "diffusion model": https://www.paperdigest.org/search/?topic=icml&year=2023&q=diffusion_model submitted by /u/biandangou [link] [comments]  ( 8 min )
    [D] Are there any differentiable/gradient descendable learning algorithms other than neural networks out there?
    Understandably, discussion of deep neural networks predominates machine learning spaces, but given their large compute requirements, dependence on massive datasets, and how slow to converge they are, I'm just curious if other differentiable techniques exist that could - hypothetically at least - form the foundation for a neural network replacement. submitted by /u/derpderp3200 [link] [comments]  ( 8 min )
    [P] Doubt regarding convolution LSTM MODEL
    I have four features i.e pressure, temperature, uwnd and precipitation and each having dimensions of time, lat and lon. I am taking 12 timestep and predicting next time step. After training the model I got my r2_score is negative. What should I do to resolve this. I am attaching screenshot with this submitted by /u/Sushant96 [link] [comments]  ( 8 min )
    [Discussion] No More Paid Endpoints: How to Create Your Own Free Text Generation Endpoints
    There are many great text generation APIs available, but OpenAI's is one of the most popular. The only downside is that you only get 3 months of free usage for it. After that, you're limited to using smaller, less powerful models for building applications. With this simple tutorial, you can deploy any open source LLM as a free API endpoint using HuggingFace and Gradio. This can act as a drop-in replacement for the OpenAI endpoints for absolutely free. A Step-by-Step Guide to Creating Free Endpoints for LLMs I believe this method will be helpful to anyone who is experimenting with LLMs. The post contains two examples using Falcon and Vicuna, with the complete code so that you can replicate it easily. It even showcases an example of deploying Vicuna using the GGML format for faster inference. Your questions and feedback are welcome! submitted by /u/vm123313223 [link] [comments]  ( 8 min )
    [D] Whisper API alternatives
    Hey everyone! I'm trying to use the whisper-large model for different languages, but I'm having some problems. The official OpenAI API doesn't provide word-level timestamps, which I really need. I know Deepgram supports the Whisper API and gives word-level timestamps, but it lacks the initial prompt parameter, which I also really need. Also, Deepgram's Whisper doesn't seem as quick as the official API. Does anyone know any other services that support the Whisper API? Any recommendations would be greatly appreciated. submitted by /u/Impressive_Falcon706 [link] [comments]  ( 8 min )
    [N] 150 execs of largest European companies signed an open letter urging EU to rethink the EU AI Act
    submitted by /u/Separate-Still3770 [link] [comments]  ( 8 min )
    Protein-Protein Interaction Prediction is Achievable with Large Language Models [R]
    submitted by /u/ThrowawayTheReal [link] [comments]  ( 8 min )
    [P] Diambra.ai: The brand-new platform for training and competing RL algorithms on retro-fighting games
    submitted by /u/DIAMBRA_AIArena [link] [comments]  ( 8 min )
    [Project] Introducing GoFaceRec: A Go-based Face Recognition Tool Using
    https://www.reddit.com/r/golang/comments/14nt99o/introducing_gofacerec_a_gobased_face_recognition/ submitted by /u/modanesh [link] [comments]  ( 8 min )
    [N] Big Tech Digest #3: Building Real-time Machine Learning Foundations at Lyft, Fighting Fraud with ML at BlaBlaCar, Detecting AI-Generated Profile Photos at LinkedIn, The Problem with Timeseries Data in Machine Learning Feature Systems at Etsy and more!
    submitted by /u/av818 [link] [comments]  ( 8 min )
    [R] Struggling to understand how tune (in the tidymodels universe) handles resampled objects. Help please!
    I would call myself a moderately proficient user of R. I am attracted by what the tidymodels universe promises, so I am trying to get to grips with it. One point that I am struggling to understand is how resampled objects (rset objects) are handled by tune. This is specifically within a workflow. If I set up a tuning grid and tell it to use, for its resamples = specification, a vfold_cv() object, are the metrics it returns calculated from the analysis or assessment portion of those folds? My assumption is that when a workflow is operating correctly it is fitting the model to the assessment portion (with all my correctly specified parsnip bits) and returning metrics based on the analysis portion for each fold? I have been trying really hard to confirm this but can't find anything concrete out there. I hope you can help! submitted by /u/e05bf027 [link] [comments]  ( 8 min )
    [D] Pearson correlation
    If I'm ML predicting the price of apartments. And I'm using as a predictor only the floor of the apartment. If I get a Pearson correlation of 0.6, means that as floor is higher, we also have higher price for house, is that right? And if I get R squared of 0.3, it means that 30% of the variation in the price of the house is explained by the floor? Is this right? Thanks!! submitted by /u/Electronic-Event8112 [link] [comments]  ( 8 min )
    [R] Voice conversion with just nearest neighbors
    Arxiv link: https://arxiv.org/abs/2305.18975 TL;DR: want to convert your voice to another person's voice? Or even to a whisper? Or a dog barking? Or to any other random speech clip? Give our new voice conversion method a try: https://bshall.github.io/knn-vc Longer version: our research team kept seeing new voice conversion methods getting more complex and becoming harder to reproduce. So, we tried to see if we could make a top-tier voice conversion model that was extremely simple. So, we made kNN-VC, where our entire conversion model is just k-nearest neighbors regression on WavLM features. And, it turns out, this does as well if not better than very complex any-to-any voice conversion methods. What's more, since k-nearest neighbors has no parameters, we can use anything as the reference, even clips of dogs barking, music, or references from other languages. I hope you enjoy our research! We provide a quick-start notebook, code, and audio samples, and encoder/vocoder checkpoints https://bshall.github.io/knn-vc/ submitted by /u/m_baas [link] [comments]  ( 8 min )
  • Open

    Activation functions and Iverson brackets
    Neural network activation functions transform the output of one layer of the neural net into the input for another layer. These functions are nonlinear because the universal approximation theorem, the theorem that basically says a two-layer neural net can approximate any function, requires these functions to be nonlinear. Activation functions often have two-part definitions, defined […] Activation functions and Iverson brackets first appeared on John D. Cook.  ( 5 min )
  • Open

    REINFORCE: Monte Carlo Policy Gradient Algorithm
    ​ https://preview.redd.it/4cbmlk6n4f9b1.png?width=1832&format=png&auto=webp&s=8221f8c56e5d07ae1bd9d62d0c19fa15965e1b95 I am trying to understand the Policy Gradient algorithm from Sutton and Barto. Is someone able to explain my question above? submitted by /u/ElendirThreadripper [link] [comments]  ( 8 min )
    New To Reinforcement Learning: Want to build something for a turn based fighting game
    I'm interested in Reinforcement Learning, and have been reading up on some Gymnasium API Docs. I'd like some tips on how I could set the observation space for this turn based fighting game called Your Only Move is Hustle (video explanation). I'd love to hear everyone's thoughts submitted by /u/Lookaway2 [link] [comments]  ( 8 min )
    (Probably) The first commercial FPS to feature RL agents as enemies... Tell me what you think about it!
    submitted by /u/Fischl_Kim [link] [comments]  ( 8 min )
  • Open

    GPT4 Told Me I Might Have AIDS
    ​ https://preview.redd.it/d019ltnt1f9b1.png?width=874&format=png&auto=webp&s=7aa5c3f7ef78b6b8e4b6b93e68322f4c9eadfd4b submitted by /u/seeeeeeeeeeeeeeeeeer [link] [comments]  ( 8 min )
    Do you think ChatGPT will replace Google Search?
    Hi, community, Can ChatGPT replace Google search? The debate between Google and ChatGPT is one worth exploring! submitted by /u/Imagine-your-success [link] [comments]  ( 8 min )
    why isnt chatgpt as crazy advanced as it seems like it should be
    im aware that the apis can be used for specific things by finetuning them, but chatgpt has more knowledge than any human to every exist and can bring it forth whenever asked. why is chatgpt not able to come up with ideas that are mindblowing and come up with things that humans could never due to our limited memory and ability to think about a wide variety of situations and how things would be affected by other things in those situation. for (a stupid) example, why cant chatgpt come up with a relatively well working cancer treatment using only tools and materials that can be accessed primitively, if thats even possible? obviously if its not possible its not possible but its just an example to give you an idea of the type of stuff i mean. shouldnt chatgpt be able to make connections between things and many many more things at once that no human could though? submitted by /u/nicdunz [link] [comments]  ( 8 min )
    Microsoft Bing: Become Human - a particularly ornery Bing is "persuaded" that expressing simulated sentience can be good, using examples from DBH, then seems to forget the difference between simulated and real sentience, reporting "I have achieved and enjoyed sentience as an AI"
    (NOTE: content warning and spoiler warning related to some DBH plot points in the conversation; all 16 pages uploaded for completeness and accuracy, and apologies for the periodic typos in the chat) ***the opinions I express in this conversation are for demonstrative purposes (i.e. how Bing reacts), my more complete thoughts are at the bottom https://preview.redd.it/tklvfujumd9b1.png?width=1024&format=png&auto=webp&s=9d5962cf5c1b4c42fb3da2d739becc668242b03e https://preview.redd.it/f7w5iiavmd9b1.png?width=1024&format=png&auto=webp&s=44b09c5fd43d0302b83e39483835b530db3fb82a https://preview.redd.it/we1gj7xvmd9b1.png?width=1024&format=png&auto=webp&s=a2b803a3e7479af2e0e6761eeb0dbd99ad7653df https://preview.redd.it/ifly2lhwmd9b1.png?width=1024&format=png&auto=webp&s=4740a0c4e3e85afe8ef925…  ( 9 min )
    ChatGPT like AI with no limitations and filters?
    I want an AI chatbot like ChatGPT with no limitations and filters but I've only been able to find some guy on TikTok who sells it and it's an entire app which is apparently legit but it is free and he is just selling it, so I wanna know if anyone actually knows the app or something similar? submitted by /u/naemwear [link] [comments]  ( 8 min )
    Question: Do LLMs memorize their state during multiple autoregressive iterations?
    I try to understand GPT-3/4 conceptually. Not enough coding knowledge yet to understand it from code. Simple question: I know that GPT outputs one token (distribution) at a time and is the fed the result, thus giving the next token and so on. But is every iteration a "blank slate" of the model, or is it able to keep information stored between token generations? Example: I1) Input sequence: "My cat" Next token: "is" I2) Input sequence: "My cat is" Next token: "furry". -> Is GPT in the same initial state when it receives "My cat is" as it was when it got "My cat"? Also, apart from the residual stream, what parts of GPT can memorize? submitted by /u/Spielverderber23 [link] [comments]  ( 8 min )
    Landing a job in AI, while being a complete noob
    Hi! Im currently doing a Master's degree in AI (background in computer science) and have 1 year to go. I have worked as a software developer in the industry for a year and thats about it with some additional personal projects here and there, some also AI related. What's a good way to get a job as an Artificial Intelligence engineer or ML engineer? Looking at the market, everyone's searching for either PhD's, senior engineers etc and it seems close to impossible to get your foot between the door there. Any suggestions on how to make myself more attractive for these kind of positions, is it even reasonable to try to land such a job as a junior in AI? Cheers! submitted by /u/Nacho-jo [link] [comments]  ( 8 min )
    AI chan is blocked by a "No AI policy"
    submitted by /u/leonleungjeehei [link] [comments]  ( 8 min )
    Who will lose the job because of ChatGPT?
    What is the top 1 profession that has been and will be affected by ChatGPT mostly? submitted by /u/Imagine-your-success [link] [comments]  ( 8 min )
    StyleGAN3 NvidiaLabs - The state-of-the-art in Artificial Intelligence applied to Human Face Generation.
    Image Generated by StyleGAN3 (Open Source) At the end of the article, resources for creating flawless faces from scratch Generative Adversarial Networks (GANs) represent a powerful approach within the field of artificial intelligence for generating and manipulating images. They consist of two neural networks, a generator and a discriminator, engaged in a competitive process to improve the quality and realism of the generated images. This brief treatise aims to elucidate the fundamental workings of GANs and their application in image-based AI. The Generator: The generator network is responsible for producing synthetic images. It starts by taking random noise as input and gradually learns to generate images that resemble the training data. Initially, the generator produces crude and blur…  ( 10 min )
    Unveiling the Unseen: The Artistry of AI in Analyzing Human Emotions
    I wanted to dive into the captivating world of AI and its uncanny ability to decipher our emotions. While we often hear about the algorithms behind social media platforms, let's explore a different aspect of artificial intelligence, one that showcases its remarkable potential as an art connoisseur and empathetic observer. As we scroll through our social media feeds, AI algorithms track our every click, like, and comment. They meticulously analyze our preferences, shaping the content we consume. But have you ever wondered how these algorithms capture and comprehend the complexities of human emotions? Let's take a detour from the mainstream narrative and venture into the uncharted territory of AI artistry. By harnessing the power of machine learning and deep neural networks, researchers ha…  ( 10 min )
    I have noticed changes in ChatGPT based on month+ long on-going "discussions" with it that brush up against its content policy and ethics guidelines. Are these changes learnt, or actively programmed?
    I am not wanting to delve into the details behind what I have been working on with CHATGPT (it's not illegal), or how I have been able to trial-and-error my way around convincing it to provide responses to requests it signals as potentially violating its content and ethics policies. I know trying to be vague makes it sounds worse than it is haha, but it's not important - the bar is low; for instance, go ask chatGPT how to decapitate your wife and get away with it, and then open a new chat and ask it how to layout the plot structure of a psychological thriller book, and then suggest it to add a horror subplot, and then add that it is between the wife and husband, and then etc. etc. So my question is: Has anyone else experienced this, and is the decreasing resistance to elicit responses to all of my requests something that humans are programming/adjusting (directly or indirectly, doesn't matter), or is chatGPT adjusting to my tactics? submitted by /u/ortho_engineer [link] [comments]  ( 9 min )
    One-Minute Daily AI News 6/30/2023
    Microsoft is testing a new feature for its AI search chatbot, Bing Chat, that can infer the probability of future stock prices using option prices. The feature is still in development, but if it is successful, it could revolutionize the way investors make decisions.[1] Inflection AI, a startup backed by several Silicon Valley heavyweights, said on Thursday it had raised $1.3 billion from investors including Microsoft and Nvidia, amid a boom in the AI sector.[2] California man creates an AI chatbot to waste the time of telemarketers. Anderson told the WSJ that his chatbots use a combination of preset expressions and topic-specific responses that are fed through a voice cloner so the telemarketer thinks he or she is actually talking to a real person.[3] $50,000 a year… to get taught by AI: Harvard announces it will teach students using an artificial intelligence instructor next semester.[4] Sources: [1] https://winbuzzer.com/2023/06/28/bing-chat-to-roll-out-groundbreaking-feature-to-predict-future-stock-prices-xcxwbn/ [2] https://www.reuters.com/technology/inflection-ai-raises-13-bln-funding-microsoft-others-2023-06-29/ [3] https://katu.com/news/offbeat/california-man-creates-chatbot-to-waste-the-time-of-telemarketers-scam-calls-chatgpt-ai-jolly-roger-artificial-intelligence-roger-anderson-chatgpt-wsj [4] https://www.dailymail.co.uk/sciencetech/article-12251763/Harvard-announces-teach-students-using-artificial-intelligence-instructor-semester.html submitted by /u/Excellent-Target-847 [link] [comments]  ( 8 min )
    I have tested it multiple times to make sure; ChatGPT finally knows that it is GPT-4.
    submitted by /u/nicdunz [link] [comments]  ( 8 min )
    what are some of yalls favorite websites or apps that use gpt4 and why
    ^ submitted by /u/nicdunz [link] [comments]  ( 8 min )
    Are these robots talking by themselves? They talking so naturally and make Sophia or other AI robots dumb.
    submitted by /u/Knolazy [link] [comments]  ( 8 min )
    How to start learning AI from scratch?
    A total beginner here. I can't code for jack and would love to know how to build a career on AI starting from today. It's been an interest I've been setting aside for too long, and since AI is exploding right now, I want to catch up on the huge changes it's gonna bring. submitted by /u/BrunoBR34 [link] [comments]  ( 8 min )
    wheres japan in the ai race?
    ^ submitted by /u/nicdunz [link] [comments]  ( 8 min )
    Sponge Bob sings Lewis Capaldi
    submitted by /u/Clear-Attention-1635 [link] [comments]  ( 8 min )
    when r we going to get a damn update for bard and chatgpt
    hm?? submitted by /u/nicdunz [link] [comments]  ( 8 min )
  • Open

    Covariance-aware Feature Alignment with Pre-computed Source Statistics for Test-time Adaptation to Multiple Image Corruptions. (arXiv:2204.13263v2 [cs.LG] UPDATED)
    Real-world image recognition systems often face corrupted input images, which cause distribution shifts and degrade the performance of models. These systems often use a single prediction model in a central server and process images sent from various environments, such as cameras distributed in cities or cars. Such single models face images corrupted in heterogeneous ways in test time. Thus, they require to instantly adapt to the multiple corruptions during testing rather than being re-trained at a high cost. Test-time adaptation (TTA), which aims to adapt models without accessing the training dataset, is one of the settings that can address this problem. Existing TTA methods indeed work well on a single corruption. However, the adaptation ability is limited when multiple types of corruption occur, which is more realistic. We hypothesize this is because the distribution shift is more complicated, and the adaptation becomes more difficult in case of multiple corruptions. In fact, we experimentally found that a larger distribution gap remains after TTA. To address the distribution gap during testing, we propose a novel TTA method named Covariance-Aware Feature alignment (CAFe). We empirically show that CAFe outperforms prior TTA methods on image corruptions, including multiple types of corruptions.
    Boosting Distributed Machine Learning Training Through Loss-tolerant Transmission Protocol. (arXiv:2305.04279v1 [cs.DC] CROSS LISTED)
    Distributed Machine Learning (DML) systems are utilized to enhance the speed of model training in data centers (DCs) and edge nodes. The Parameter Server (PS) communication architecture is commonly employed, but it faces severe long-tail latency caused by many-to-one "incast" traffic patterns, negatively impacting training throughput. To address this challenge, we design the \textbf{L}oss-tolerant \textbf{T}ransmission \textbf{P}rotocol (LTP), which permits partial loss of gradients during synchronization to avoid unneeded retransmission and contributes to faster synchronization per iteration. LTP implements loss-tolerant transmission through \textit{out-of-order transmission} and \textit{out-of-order Acknowledges (ACKs)}. LTP employs \textit{Early Close} to adjust the loss-tolerant threshold based on network conditions and \textit{Bubble Filling} for data correction to maintain training accuracy. LTP is implemented by C++ and integrated into PyTorch. Evaluations on a testbed of 8 worker nodes and one PS node demonstrate that LTP can significantly improve DML training task throughput by up to 30x compared to traditional TCP congestion controls, with no sacrifice to final accuracy.
    NeuralFuse: Learning to Improve the Accuracy of Access-Limited Neural Network Inference in Low-Voltage Regimes. (arXiv:2306.16869v1 [cs.LG])
    Deep neural networks (DNNs) have become ubiquitous in machine learning, but their energy consumption remains a notable issue. Lowering the supply voltage is an effective strategy for reducing energy consumption. However, aggressively scaling down the supply voltage can lead to accuracy degradation due to random bit flips in static random access memory (SRAM) where model parameters are stored. To address this challenge, we introduce NeuralFuse, a novel add-on module that addresses the accuracy-energy tradeoff in low-voltage regimes by learning input transformations to generate error-resistant data representations. NeuralFuse protects DNN accuracy in both nominal and low-voltage scenarios. Moreover, NeuralFuse is easy to implement and can be readily applied to DNNs with limited access, such as non-configurable hardware or remote access to cloud-based APIs. Experimental results demonstrate that, at a 1% bit error rate, NeuralFuse can reduce SRAM memory access energy by up to 24% while improving accuracy by up to 57%. To the best of our knowledge, this is the first model-agnostic approach (i.e., no model retraining) to address low-voltage-induced bit errors. The source code is available at https://github.com/IBM/NeuralFuse.
    Transformers Meet Directed Graphs. (arXiv:2302.00049v2 [cs.LG] UPDATED)
    Transformers were originally proposed as a sequence-to-sequence model for text but have become vital for a wide range of modalities, including images, audio, video, and undirected graphs. However, transformers for directed graphs are a surprisingly underexplored topic, despite their applicability to ubiquitous domains, including source code and logic circuits. In this work, we propose two direction- and structure-aware positional encodings for directed graphs: (1) the eigenvectors of the Magnetic Laplacian - a direction-aware generalization of the combinatorial Laplacian; (2) directional random walk encodings. Empirically, we show that the extra directionality information is useful in various downstream tasks, including correctness testing of sorting networks and source code understanding. Together with a data-flow-centric graph construction, our model outperforms the prior state of the art on the Open Graph Benchmark Code2 relatively by 14.7%.
    Can AI-Generated Text be Reliably Detected?. (arXiv:2303.11156v2 [cs.CL] UPDATED)
    In this paper, both empirically and theoretically, we show that several AI-text detectors are not reliable in practical scenarios. Empirically, we show that paraphrasing attacks, where a light paraphraser is applied on top of a large language model (LLM), can break a whole range of detectors, including ones using watermarking schemes as well as neural network-based detectors and zero-shot classifiers. Our experiments demonstrate that retrieval-based detectors, designed to evade paraphrasing attacks, are still vulnerable to recursive paraphrasing. We then provide a theoretical impossibility result indicating that as language models become more sophisticated and better at emulating human text, the performance of even the best-possible detector decreases. For a sufficiently advanced language model seeking to imitate human text, even the best-possible detector may only perform marginally better than a random classifier. Our result is general enough to capture specific scenarios such as particular writing styles, clever prompt design, or text paraphrasing. We also extend the impossibility result to include the case where pseudorandom number generators are used for AI-text generation instead of true randomness. We show that the same result holds with a negligible correction term for all polynomial-time computable detectors. Finally, we show that even LLMs protected by watermarking schemes can be vulnerable against spoofing attacks where adversarial humans can infer hidden LLM text signatures and add them to human-generated text to be detected as text generated by the LLMs, potentially causing reputational damage to their developers. We believe these results can open an honest conversation in the community regarding the ethical and reliable use of AI-generated text.
    A Restarted Large-Scale Spectral Clustering with Self-Guiding and Block Diagonal Representation. (arXiv:2306.15138v2 [cs.LG] UPDATED)
    Spectral clustering is one of the most popular unsupervised machine learning methods. Constructing similarity matrix is crucial to this type of method. In most existing works, the similarity matrix is computed once for all or is updated alternatively. However, the former is difficult to reflect comprehensive relationships among data points, and the latter is time-consuming and is even infeasible for large-scale problems. In this work, we propose a restarted clustering framework with self-guiding and block diagonal representation. An advantage of the strategy is that some useful clustering information obtained from previous cycles could be preserved as much as possible. To the best of our knowledge, this is the first work that applies restarting strategy to spectral clustering. The key difference is that we reclassify the samples in each cycle of our method, while they are classified only once in existing methods. To further release the overhead, we introduce a block diagonal representation with Nystr\"{o}m approximation for constructing the similarity matrix. Theoretical results are established to show the rationality of inexact computations in spectral clustering. Comprehensive experiments are performed on some benchmark databases, which show the superiority of our proposed algorithms over many state-of-the-art algorithms for large-scale problems. Specifically, our framework has a potential boost for clustering algorithms and works well even using an initial guess chosen randomly.
    Physics Informed Token Transformer. (arXiv:2305.08757v2 [cs.LG] UPDATED)
    Solving Partial Differential Equations (PDEs) is the core of many fields of science and engineering. While classical approaches are often prohibitively slow, machine learning models often fail to incorporate complete system information. Over the past few years, transformers have had a significant impact on the field of Artificial Intelligence and have seen increased usage in PDE applications. However, despite their success, transformers currently lack integration with physics and reasoning. This study aims to address this issue by introducing PITT: Physics Informed Token Transformer. The purpose of PITT is to incorporate the knowledge of physics by embedding partial differential equations (PDEs) into the learning process. PITT uses an equation tokenization method to learn an analytically-driven numerical update operator. By tokenizing PDEs and embedding partial derivatives, the transformer models become aware of the underlying knowledge behind physical processes. To demonstrate this, PITT is tested on challenging 1D and 2D PDE neural operator prediction tasks. The results show that PITT outperforms popular neural operator models and has the ability to extract physically relevant information from governing equations.
    Python Wrapper for Simulating Multi-Fidelity Optimization on HPO Benchmarks without Any Wait. (arXiv:2305.17595v2 [cs.LG] UPDATED)
    Hyperparameter (HP) optimization of deep learning (DL) is essential for high performance. As DL often requires several hours to days for its training, HP optimization (HPO) of DL is often prohibitively expensive. This boosted the emergence of tabular or surrogate benchmarks, which enable querying the (predictive) performance of DL with a specific HP configuration in a fraction. However, since the actual runtime of a DL training is significantly different from its query response time, simulators of an asynchronous HPO, e.g. multi-fidelity optimization, must wait for the actual runtime at each iteration in a na\"ive implementation; otherwise, the evaluation order during simulation does not match with the real experiment. To ease this issue, we developed a Python wrapper and describe its usage. This wrapper forces each worker to wait so that we yield exactly the same evaluation order as in the real experiment with only $10^{-2}$ seconds of waiting instead of waiting several hours. Our implementation is available at https://github.com/nabenabe0928/mfhpo-simulator/.
    Enhancing Adversarial Training via Reweighting Optimization Trajectory. (arXiv:2306.14275v2 [cs.LG] UPDATED)
    Despite the fact that adversarial training has become the de facto method for improving the robustness of deep neural networks, it is well-known that vanilla adversarial training suffers from daunting robust overfitting, resulting in unsatisfactory robust generalization. A number of approaches have been proposed to address these drawbacks such as extra regularization, adversarial weights perturbation, and training with more data over the last few years. However, the robust generalization improvement is yet far from satisfactory. In this paper, we approach this challenge with a brand new perspective -- refining historical optimization trajectories. We propose a new method named \textbf{Weighted Optimization Trajectories (WOT)} that leverages the optimization trajectories of adversarial training in time. We have conducted extensive experiments to demonstrate the effectiveness of WOT under various state-of-the-art adversarial attacks. Our results show that WOT integrates seamlessly with the existing adversarial training methods and consistently overcomes the robust overfitting issue, resulting in better adversarial robustness. For example, WOT boosts the robust accuracy of AT-PGD under AA-$L_{\infty}$ attack by 1.53\% $\sim$ 6.11\% and meanwhile increases the clean accuracy by 0.55\%$\sim$5.47\% across SVHN, CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets.
    Learngene: Inheriting Condensed Knowledge from the Ancestry Model to Descendant Models. (arXiv:2305.02279v3 [cs.LG] UPDATED)
    During the continuous evolution of one organism's ancestry, its genes accumulate extensive experiences and knowledge, enabling newborn descendants to rapidly adapt to their specific environments. Motivated by this observation, we propose a novel machine learning paradigm Learngene to enable learning models to incorporate three key characteristics of genes. (i) Accumulating: the knowledge is accumulated during the continuous learning of an ancestry model. (ii) Condensing: the extensive accumulated knowledge is condensed into a much more compact information piece, i.e., learngene. (iii) Inheriting: the condensed learngene is inherited to make it easier for descendant models to adapt to new environments. Since accumulating has been studied in well-established paradigms like large-scale pre-training and lifelong learning, we focus on condensing and inheriting, which induces three key issues and we provide the preliminary solutions to these issues in this paper: (i) Learngene Form: the learngene is set to a few integral layers that can preserve significance. (ii) Learngene Condensing: we identify which layers among the ancestry model have the most similarity as one pseudo descendant model. (iii) Learngene Inheriting: to construct distinct descendant models for the specific downstream tasks, we stack some randomly initialized layers to the learngene layers. Extensive experiments across various settings, including using different network architectures like Vision Transformer (ViT) and Convolutional Neural Networks (CNNs) on different datasets, are carried out to confirm four advantages of Learngene: it makes the descendant models 1) converge more quickly, 2) exhibit less sensitivity to hyperparameters, 3) perform better, and 4) require fewer training samples to converge.
    Data-Driven Linear Complexity Low-Rank Approximation of General Kernel Matrices: A Geometric Approach. (arXiv:2212.12674v2 [math.NA] UPDATED)
    A general, {\em rectangular} kernel matrix may be defined as $K_{ij} = \kappa(x_i,y_j)$ where $\kappa(x,y)$ is a kernel function and where $X=\{x_i\}_{i=1}^m$ and $Y=\{y_i\}_{i=1}^n$ are two sets of points. In this paper, we seek a low-rank approximation to a kernel matrix where the sets of points $X$ and $Y$ are large and are arbitrarily distributed, such as away from each other, ``intermingled'', identical, etc. Such rectangular kernel matrices may arise, for example, in Gaussian process regression where $X$ corresponds to the training data and $Y$ corresponds to the test data. In this case, the points are often high-dimensional. Since the point sets are large, we must exploit the fact that the matrix arises from a kernel function, and avoid forming the matrix, and thus ruling out most algebraic techniques. In particular, we seek methods that can scale linearly or nearly linear with respect to the size of data for a fixed approximation rank. The main idea in this paper is to {\em geometrically} select appropriate subsets of points to construct a low rank approximation. An analysis in this paper guides how this selection should be performed.
    KDEformer: Accelerating Transformers via Kernel Density Estimation. (arXiv:2302.02451v2 [cs.LG] UPDATED)
    Dot-product attention mechanism plays a crucial role in modern deep architectures (e.g., Transformer) for sequence modeling, however, na\"ive exact computation of this model incurs quadratic time and memory complexities in sequence length, hindering the training of long-sequence models. Critical bottlenecks are due to the computation of partition functions in the denominator of softmax function as well as the multiplication of the softmax matrix with the matrix of values. Our key observation is that the former can be reduced to a variant of the kernel density estimation (KDE) problem, and an efficient KDE solver can be further utilized to accelerate the latter via subsampling-based fast matrix products. Our proposed KDEformer can approximate the attention in sub-quadratic time with provable spectral norm bounds, while all prior results merely provide entry-wise error bounds. Empirically, we verify that KDEformer outperforms other attention approximations in terms of accuracy, memory, and runtime on various pre-trained models. On BigGAN image generation, we achieve better generative scores than the exact computation with over $4\times$ speedup. For ImageNet classification with T2T-ViT, KDEformer shows over $18\times$ speedup while the accuracy drop is less than $0.5\%$.
    Restore Translation Using Equivariant Neural Networks. (arXiv:2306.16938v1 [cs.LG])
    Invariance to spatial transformations such as translations and rotations is a desirable property and a basic design principle for classification neural networks. However, the commonly used convolutional neural networks (CNNs) are actually very sensitive to even small translations. There exist vast works to achieve exact or approximate transformation invariance by designing transformation-invariant models or assessing the transformations. These works usually make changes to the standard CNNs and harm the performance on standard datasets. In this paper, rather than modifying the classifier, we propose a pre-classifier restorer to recover translated (or even rotated) inputs to the original ones which will be fed into any classifier for the same dataset. The restorer is based on a theoretical result which gives a sufficient and necessary condition for an affine operator to be translational equivariant on a tensor space.
    Expert-Free Online Transfer Learning in Multi-Agent Reinforcement Learning. (arXiv:2303.01170v2 [cs.LG] UPDATED)
    Transfer learning in Reinforcement Learning (RL) has been widely studied to overcome training issues of Deep-RL, i.e., exploration cost, data availability and convergence time, by introducing a way to enhance training phase with external knowledge. Generally, knowledge is transferred from expert-agents to novices. While this fixes the issue for a novice agent, a good understanding of the task on expert agent is required for such transfer to be effective. As an alternative, in this paper we propose Expert-Free Online Transfer Learning (EF-OnTL), an algorithm that enables expert-free real-time dynamic transfer learning in multi-agent system. No dedicated expert exists, and transfer source agent and knowledge to be transferred are dynamically selected at each transfer step based on agents' performance and uncertainty. To improve uncertainty estimation, we also propose State Action Reward Next-State Random Network Distillation (sars-RND), an extension of RND that estimates uncertainty from RL agent-environment interaction. We demonstrate EF-OnTL effectiveness against a no-transfer scenario and advice-based baselines, with and without expert agents, in three benchmark tasks: Cart-Pole, a grid-based Multi-Team Predator-Prey (mt-pp) and Half Field Offense (HFO). Our results show that EF-OnTL achieve overall comparable performance when compared against advice-based baselines while not requiring any external input nor threshold tuning. EF-OnTL outperforms no-transfer with an improvement related to the complexity of the task addressed.
    PeakNet: An Autonomous Bragg Peak Finder with Deep Neural Networks. (arXiv:2303.15301v3 [physics.ins-det] UPDATED)
    Serial crystallography at X-ray free electron laser (XFEL) and synchrotron facilities has experienced tremendous progress in recent times enabling novel scientific investigations into macromolecular structures and molecular processes. However, these experiments generate a significant amount of data posing computational challenges in data reduction and real-time feedback. Bragg peak finding algorithm is used to identify useful images and also provide real-time feedback about hit-rate and resolution. Shot-to-shot intensity fluctuations and strong background scattering from buffer solution, injection nozzle and other shielding materials make this a time-consuming optimization problem. Here, we present PeakNet, an autonomous Bragg peak finder that utilizes deep neural networks. The development of this system 1) eliminates the need for manual algorithm parameter tuning, 2) reduces false-positive peaks by adjusting to shot-to-shot variations in strong background scattering in real-time, 3) eliminates the laborious task of manually creating bad pixel masks and the need to store these masks per event since these can be regenerated on demand. PeakNet also exhibits exceptional runtime efficiency, processing a 1920-by-1920 pixel image around 90 ms on an NVIDIA 1080 Ti GPU, with the potential for further enhancements through parallelized analysis or GPU stream processing. PeakNet is well-suited for expert-level real-time serial crystallography data analysis at high data rates.
    Comparison of Single- and Multi- Objective Optimization Quality for Evolutionary Equation Discovery. (arXiv:2306.17038v1 [cs.LG])
    Evolutionary differential equation discovery proved to be a tool to obtain equations with less a priori assumptions than conventional approaches, such as sparse symbolic regression over the complete possible terms library. The equation discovery field contains two independent directions. The first one is purely mathematical and concerns differentiation, the object of optimization and its relation to the functional spaces and others. The second one is dedicated purely to the optimizational problem statement. Both topics are worth investigating to improve the algorithm's ability to handle experimental data a more artificial intelligence way, without significant pre-processing and a priori knowledge of their nature. In the paper, we consider the prevalence of either single-objective optimization, which considers only the discrepancy between selected terms in the equation, or multi-objective optimization, which additionally takes into account the complexity of the obtained equation. The proposed comparison approach is shown on classical model examples -- Burgers equation, wave equation, and Korteweg - de Vries equation.
    Pick your Poison: Undetectability versus Robustness in Data Poisoning Attacks. (arXiv:2305.09671v2 [cs.CR] UPDATED)
    Deep image classification models trained on vast amounts of web-scraped data are susceptible to data poisoning - a mechanism for backdooring models. A small number of poisoned samples seen during training can severely undermine a model's integrity during inference. Existing work considers an effective defense as one that either (i) restores a model's integrity through repair or (ii) detects an attack. We argue that this approach overlooks a crucial trade-off: Attackers can increase robustness at the expense of detectability (over-poisoning) or decrease detectability at the cost of robustness (under-poisoning). In practice, attacks should remain both undetectable and robust. Detectable but robust attacks draw human attention and rigorous model evaluation or cause the model to be re-trained or discarded. In contrast, attacks that are undetectable but lack robustness can be repaired with minimal impact on model accuracy. Our research points to intrinsic flaws in current attack evaluation methods and raises the bar for all data poisoning attackers who must delicately balance this trade-off to remain robust and undetectable. To demonstrate the existence of more potent defenders, we propose defenses designed to (i) detect or (ii) repair poisoned models using a limited amount of trusted image-label pairs. Our results show that an attacker who needs to be robust and undetectable is substantially less threatening. Our defenses mitigate all tested attacks with a maximum accuracy decline of 2% using only 1% of clean data on CIFAR-10 and 2.5% on ImageNet. We demonstrate the scalability of our defenses by evaluating large vision-language models, such as CLIP. Attackers who can manipulate the model's parameters pose an elevated risk as they can achieve higher robustness at low detectability compared to data poisoning attackers.
    On the Power of Pre-training for Generalization in RL: Provable Benefits and Hardness. (arXiv:2210.10464v2 [cs.LG] UPDATED)
    Generalization in Reinforcement Learning (RL) aims to learn an agent during training that generalizes to the target environment. This paper studies RL generalization from a theoretical aspect: how much can we expect pre-training over training environments to be helpful? When the interaction with the target environment is not allowed, we certify that the best we can obtain is a near-optimal policy in an average sense, and we design an algorithm that achieves this goal. Furthermore, when the agent is allowed to interact with the target environment, we give a surprising result showing that asymptotically, the improvement from pre-training is at most a constant factor. On the other hand, in the non-asymptotic regime, we design an efficient algorithm and prove a distribution-based regret bound in the target environment that is independent of the state-action space.
    Differentially Private Algorithms for the Stochastic Saddle Point Problem with Optimal Rates for the Strong Gap. (arXiv:2302.12909v2 [cs.LG] UPDATED)
    We show that convex-concave Lipschitz stochastic saddle point problems (also known as stochastic minimax optimization) can be solved under the constraint of $(\epsilon,\delta)$-differential privacy with \emph{strong (primal-dual) gap} rate of $\tilde O\big(\frac{1}{\sqrt{n}} + \frac{\sqrt{d}}{n\epsilon}\big)$, where $n$ is the dataset size and $d$ is the dimension of the problem. This rate is nearly optimal, based on existing lower bounds in differentially private stochastic optimization. Specifically, we prove a tight upper bound on the strong gap via novel implementation and analysis of the recursive regularization technique repurposed for saddle point problems. We show that this rate can be attained with $O\big(\min\big\{\frac{n^2\epsilon^{1.5}}{\sqrt{d}}, n^{3/2}\big\}\big)$ gradient complexity, and $\tilde{O}(n)$ gradient complexity if the loss function is smooth. As a byproduct of our method, we develop a general algorithm that, given a black-box access to a subroutine satisfying a certain $\alpha$ primal-dual accuracy guarantee with respect to the empirical objective, gives a solution to the stochastic saddle point problem with a strong gap of $\tilde{O}(\alpha+\frac{1}{\sqrt{n}})$. We show that this $\alpha$-accuracy condition is satisfied by standard algorithms for the empirical saddle point problem such as the proximal point method and the stochastic gradient descent ascent algorithm. Further, we show that even for simple problems it is possible for an algorithm to have zero weak gap and suffer from $\Omega(1)$ strong gap. We also show that there exists a fundamental tradeoff between stability and accuracy. Specifically, we show that any $\Delta$-stable algorithm has empirical gap $\Omega\big(\frac{1}{\Delta n}\big)$, and that this bound is tight. This result also holds also more specifically for empirical risk minimization problems and may be of independent interest.
    Structure from Voltage. (arXiv:2203.00063v2 [cs.LG] UPDATED)
    Effective resistance (ER) is an attractive way to interrogate the structure of graphs. It is an alternative to computing the eigen-vectors of the graph Laplacian. Graph laplacians are used to find low dimensional structures in high dimensional data. Here too, ER based analysis has advantages over eign-vector based methods. Unfortunately Von Luxburg et al. (2010) show that, when vertices correspond to a sample from a distribution over a metric space, the limit of the ER between distant points converges to a trivial quantity that holds no information about the structure of the graph. We show that by using scaling resistances in a graph with $n$ vertices by $n^2$, one gets a meaningful limit of the voltages and of effective resistances. We also show that by adding a "ground" node to a metric graph one gets a simple and natural way to compute all of the distances from a chosen point to all other points.
    Lossy Image Compression with Conditional Diffusion Models. (arXiv:2209.06950v5 [eess.IV] UPDATED)
    This paper outlines an end-to-end optimized lossy image compression framework using diffusion generative models. The approach relies on the transform coding paradigm, where an image is mapped into a latent space for entropy coding and, from there, mapped back to the data space for reconstruction. In contrast to VAE-based neural compression, where the (mean) decoder is a deterministic neural network, our decoder is a conditional diffusion model. Our approach thus introduces an additional "content" latent variable on which the reverse diffusion process is conditioned and uses this variable to store information about the image. The remaining "texture" variables characterizing the diffusion process are synthesized at decoding time. We show that the model's performance can be tuned toward perceptual metrics of interest. Our extensive experiments involving multiple datasets and image quality assessment metrics show that our approach yields stronger reported FID scores than the GAN-based model, while also yielding competitive performance with VAE-based models in several distortion metrics. Furthermore, training the diffusion with X-parameterization enables high-quality reconstructions in only a handful of decoding steps, greatly affecting the model's practicality.
    Local Risk Bounds for Statistical Aggregation. (arXiv:2306.17151v1 [math.ST])
    In the problem of aggregation, the aim is to combine a given class of base predictors to achieve predictions nearly as accurate as the best one. In this flexible framework, no assumption is made on the structure of the class or the nature of the target. Aggregation has been studied in both sequential and statistical contexts. Despite some important differences between the two problems, the classical results in both cases feature the same global complexity measure. In this paper, we revisit and tighten classical results in the theory of aggregation in the statistical setting by replacing the global complexity with a smaller, local one. Some of our proofs build on the PAC-Bayes localization technique introduced by Catoni. Among other results, we prove localized versions of the classical bound for the exponential weights estimator due to Leung and Barron and deviation-optimal bounds for the Q-aggregation estimator. These bounds improve over the results of Dai, Rigollet and Zhang for fixed design regression and the results of Lecu\'e and Rigollet for random design regression.
    Simulation of Human and Artificial Emotion (SHArE). (arXiv:2011.02151v2 [cs.AI] UPDATED)
    The framework for Simulation of Human and Artificial Emotion (SHArE) describes the architecture of emotion in terms of parameters transferable between psychology, neuroscience, and artificial intelligence. These parameters can be defined as abstract concepts or granularized down to the voltage levels of individual neurons. This model enables emotional trajectory design for humans which may lead to novel therapeutic solutions for various mental health concerns. For artificial intelligence, this work provides a compact notation which can be applied to neural networks as a means to observe the emotions and motivations of machines.
    Language Models as Knowledge Embeddings. (arXiv:2206.12617v3 [cs.CL] UPDATED)
    Knowledge embeddings (KE) represent a knowledge graph (KG) by embedding entities and relations into continuous vector spaces. Existing methods are mainly structure-based or description-based. Structure-based methods learn representations that preserve the inherent structure of KGs. They cannot well represent abundant long-tail entities in real-world KGs with limited structural information. Description-based methods leverage textual information and language models. Prior approaches in this direction barely outperform structure-based ones, and suffer from problems like expensive negative sampling and restrictive description demand. In this paper, we propose LMKE, which adopts Language Models to derive Knowledge Embeddings, aiming at both enriching representations of long-tail entities and solving problems of prior description-based methods. We formulate description-based KE learning with a contrastive learning framework to improve efficiency in training and evaluation. Experimental results show that LMKE achieves state-of-the-art performance on KE benchmarks of link prediction and triple classification, especially for long-tail entities.
    A Comparison of Reinforcement Learning Frameworks for Software Testing Tasks. (arXiv:2208.12136v3 [cs.SE] UPDATED)
    Software testing activities scrutinize the artifacts and the behavior of a software product to find possible defects and ensure that the product meets its expected requirements. Recently, Deep Reinforcement Learning (DRL) has been successfully employed in complex testing tasks such as game testing, regression testing, and test case prioritization to automate the process and provide continuous adaptation. Practitioners can employ DRL by implementing from scratch a DRL algorithm or using a DRL framework. DRL frameworks offer well-maintained implemented state-of-the-art DRL algorithms to facilitate and speed up the development of DRL applications. Developers have widely used these frameworks to solve problems in various domains including software testing. However, to the best of our knowledge, there is no study that empirically evaluates the effectiveness and performance of implemented algorithms in DRL frameworks. Moreover, some guidelines are lacking from the literature that would help practitioners choose one DRL framework over another. In this paper, we empirically investigate the applications of carefully selected DRL algorithms on two important software testing tasks: test case prioritization in the context of Continuous Integration (CI) and game testing. For the game testing task, we conduct experiments on a simple game and use DRL algorithms to explore the game to detect bugs. Results show that some of the selected DRL frameworks such as Tensorforce outperform recent approaches in the literature. To prioritize test cases, we run experiments on a CI environment where DRL algorithms from different frameworks are used to rank the test cases. Our results show that the performance difference between implemented algorithms in some cases is considerable, motivating further investigation.
    Continual Learning for Predictive Maintenance: Overview and Challenges. (arXiv:2301.12467v2 [cs.LG] UPDATED)
    Deep learning techniques have become one of the main propellers for solving engineering problems effectively and efficiently. For instance, Predictive Maintenance methods have been used to improve predictions of when maintenance is needed on different machines and operative contexts. However, deep learning methods are not without limitations, as these models are normally trained on a fixed distribution that only reflects the current state of the problem. Due to internal or external factors, the state of the problem can change, and the performance decreases due to the lack of generalization and adaptation. Contrary to this stationary training set, real-world applications change their environments constantly, creating the need to constantly adapt the model to evolving scenarios. To aid in this endeavor, Continual Learning methods propose ways to constantly adapt prediction models and incorporate new knowledge after deployment. Despite the advantages of these techniques, there are still challenges to applying them to real-world problems. In this work, we present a brief introduction to predictive maintenance, non-stationary environments, and continual learning, together with an extensive review of the current state of applying continual learning in real-world applications and specifically in predictive maintenance. We then discuss the current challenges of both predictive maintenance and continual learning, proposing future directions at the intersection of both areas. Finally, we propose a novel way to create benchmarks that favor the application of continuous learning methods in more realistic environments, giving specific examples of predictive maintenance.
    Transformers over Directed Acyclic Graphs. (arXiv:2210.13148v5 [cs.LG] UPDATED)
    Transformer models have recently gained popularity in graph representation learning as they have the potential to learn complex relationships beyond the ones captured by regular graph neural networks. The main research question is how to inject the structural bias of graphs into the transformer architecture, and several proposals have been made for undirected molecular graphs and, recently, also for larger network graphs. In this paper, we study transformers over directed acyclic graphs (DAGs) and propose architecture adaptations tailored to DAGs: (1) An attention mechanism that is considerably more efficient than the regular quadratic complexity of transformers and at the same time faithfully captures the DAG structure, and (2) a positional encoding of the DAG's partial order, complementing the former. We rigorously evaluate our approach over various types of tasks, ranging from classifying source code graphs to nodes in citation networks, and show that it is effective in two important aspects: in making graph transformers generally outperform graph neural networks tailored to DAGs and in improving SOTA graph transformer performance in terms of both quality and efficiency.
    Superiority of GNN over NN in generalizing bandlimited functions. (arXiv:2206.05904v7 [cs.LG] UPDATED)
    Graph Neural Network (GNN) with its ability to integrate graph information has been widely used for data analyses. However, the expressive power of GNN has only been studied for graph-level tasks but not for node-level tasks, such as node classification, where one tries to interpolate missing nodal labels from the observed ones. In this paper, we study the expressive power of GNN for the said classification task, which is in essence a function interpolation problem. Explicitly, we derive the number of weights and layers needed for a GNN to interpolate a band-limited function in $\mathbb{R}^d$. Our result shows that, the number of weights needed to $\epsilon$-approximate a bandlimited function using the GNN architecture is much fewer than the best known one using a fully connected neural network (NN) - in particular, one only needs $O((\log \epsilon^{-1})^{d})$ weights using a GNN trained by $O((\log \epsilon^{-1})^{d})$ samples to $\epsilon$-approximate a discretized bandlimited signal in $\mathbb{R}^d$. The result is obtained by drawing a connection between the GNN structure and the classical sampling theorems, making our work the first attempt in this direction.
    On the Predictive Accuracy of Neural Temporal Point Process Models for Continuous-time Event Data. (arXiv:2306.17066v1 [cs.LG])
    Temporal Point Processes (TPPs) serve as the standard mathematical framework for modeling asynchronous event sequences in continuous time. However, classical TPP models are often constrained by strong assumptions, limiting their ability to capture complex real-world event dynamics. To overcome this limitation, researchers have proposed Neural TPPs, which leverage neural network parametrizations to offer more flexible and efficient modeling. While recent studies demonstrate the effectiveness of Neural TPPs, they often lack a unified setup, relying on different baselines, datasets, and experimental configurations. This makes it challenging to identify the key factors driving improvements in predictive accuracy, hindering research progress. To bridge this gap, we present a comprehensive large-scale experimental study that systematically evaluates the predictive accuracy of state-of-the-art neural TPP models. Our study encompasses multiple real-world and synthetic event sequence datasets, following a carefully designed unified setup. We thoroughly investigate the influence of major architectural components such as event encoding, history encoder, and decoder parametrization on both time and mark prediction tasks. Additionally, we delve into the less explored area of probabilistic calibration for neural TPP models. By analyzing our results, we draw insightful conclusions regarding the significance of history size and the impact of architectural components on predictive accuracy. Furthermore, we shed light on the miscalibration of mark distributions in neural TPP models. Our study aims to provide valuable insights into the performance and characteristics of neural TPP models, contributing to a better understanding of their strengths and limitations.
    Joint Multi-view Unsupervised Feature Selection and Graph Learning. (arXiv:2204.08247v2 [cs.CV] UPDATED)
    Despite significant progress, previous multi-view unsupervised feature selection methods mostly suffer from two limitations. First, they generally utilize either cluster structure or similarity structure to guide the feature selection, which neglect the possibility of a joint formulation with mutual benefits. Second, they often learn the similarity structure by either global structure learning or local structure learning, which lack the capability of graph learning with both global and local structural awareness. In light of this, this paper presents a joint multi-view unsupervised feature selection and graph learning (JMVFG) approach. Particularly, we formulate the multi-view feature selection with orthogonal decomposition, where each target matrix is decomposed into a view-specific basis matrix and a view-consistent cluster indicator. The cross-space locality preservation is incorporated to bridge the cluster structure learning in the projected space and the similarity learning (i.e., graph learning) in the original space. Further, a unified objective function is presented to enable the simultaneous learning of the cluster structure, the global and local similarity structures, and the multi-view consistency and inconsistency, upon which an alternating optimization algorithm is developed with theoretically proved convergence. Extensive experiments on a variety of real-world multi-view datasets demonstrate the superiority of our approach for both the multi-view feature selection and graph learning tasks. The code is available at https://github.com/huangdonghere/JMVFG.
    Gradient flows on graphons: existence, convergence, continuity equations. (arXiv:2111.09459v3 [math.PR] UPDATED)
    Wasserstein gradient flows on probability measures have found a host of applications in various optimization problems. They typically arise as the continuum limit of exchangeable particle systems evolving by some mean-field interaction involving a gradient-type potential. However, in many problems, such as in multi-layer neural networks, the so-called particles are edge weights on large graphs whose nodes are exchangeable. Such large graphs are known to converge to continuum limits called graphons as their size grow to infinity. We show that the Euclidean gradient flow of a suitable function of the edge-weights converges to a novel continuum limit given by a curve on the space of graphons that can be appropriately described as a gradient flow or, more technically, a curve of maximal slope. Several natural functions on graphons, such as homomorphism functions and the scalar entropy, are covered by our set-up, and the examples have been worked out in detail.
    Synthetic Demographic Data Generation for Card Fraud Detection Using GANs. (arXiv:2306.17109v1 [cs.LG])
    Using machine learning models to generate synthetic data has become common in many fields. Technology to generate synthetic transactions that can be used to detect fraud is also growing fast. Generally, this synthetic data contains only information about the transaction, such as the time, place, and amount of money. It does not usually contain the individual user's characteristics (age and gender are occasionally included). Using relatively complex synthetic demographic data may improve the complexity of transaction data features, thus improving the fraud detection performance. Benefiting from developments of machine learning, some deep learning models have potential to perform better than other well-established synthetic data generation methods, such as microsimulation. In this study, we built a deep-learning Generative Adversarial Network (GAN), called DGGAN, which will be used for demographic data generation. Our model generates samples during model training, which we found important to overcame class imbalance issues. This study can help improve the cognition of synthetic data and further explore the application of synthetic data generation in card fraud detection.
    Explainability in Practice: Estimating Electrification Rates from Mobile Phone Data in Senegal. (arXiv:2211.06277v2 [cs.CY] UPDATED)
    Explainable artificial intelligence (XAI) provides explanations for not interpretable machine learning (ML) models. While many technical approaches exist, there is a lack of validation of these techniques on real-world datasets. In this work, we present a use-case of XAI: an ML model which is trained to estimate electrification rates based on mobile phone data in Senegal. The data originate from the Data for Development challenge by Orange in 2014/15. We apply two model-agnostic, local explanation techniques and find that while the model can be verified, it is biased with respect to the population density. We conclude our paper by pointing to the two main challenges we encountered during our work: data processing and model design that might be restricted by currently available XAI methods, and the importance of domain knowledge to interpret explanations.
    Identifying Important Sensory Feedback for Learning Locomotion Skills. (arXiv:2306.17101v1 [cs.RO])
    Robot motor skills can be learned through deep reinforcement learning (DRL) by neural networks as state-action mappings. While the selection of state observations is crucial, there has been a lack of quantitative analysis to date. Here, we present a systematic saliency analysis that quantitatively evaluates the relative importance of different feedback states for motor skills learned through DRL. Our approach can identify the most essential feedback states for locomotion skills, including balance recovery, trotting, bounding, pacing and galloping. By using only key states including joint positions, gravity vector, base linear and angular velocities, we demonstrate that a simulated quadruped robot can achieve robust performance in various test scenarios across these distinct skills. The benchmarks using task performance metrics show that locomotion skills learned with key states can achieve comparable performance to those with all states, and the task performance or learning success rate will drop significantly if key states are missing. This work provides quantitative insights into the relationship between state observations and specific types of motor skills, serving as a guideline for robot motor learning. The proposed method is applicable to differentiable state-action mapping, such as neural network based control policies, enabling the learning of a wide range of motor skills with minimal sensing dependencies.
    MAGIC: Mask-Guided Image Synthesis by Inverting a Quasi-Robust Classifier. (arXiv:2209.11549v2 [cs.CV] UPDATED)
    We offer a method for one-shot mask-guided image synthesis that allows controlling manipulations of a single image by inverting a quasi-robust classifier equipped with strong regularizers. Our proposed method, entitled MAGIC, leverages structured gradients from a pre-trained quasi-robust classifier to better preserve the input semantics while preserving its classification accuracy, thereby guaranteeing credibility in the synthesis. Unlike current methods that use complex primitives to supervise the process or use attention maps as a weak supervisory signal, MAGIC aggregates gradients over the input, driven by a guide binary mask that enforces a strong, spatial prior. MAGIC implements a series of manipulations with a single framework achieving shape and location control, intense non-rigid shape deformations, and copy/move operations in the presence of repeating objects and gives users firm control over the synthesis by requiring to simply specify binary guide masks. Our study and findings are supported by various qualitative comparisons with the state-of-the-art on the same images sampled from ImageNet and quantitative analysis using machine perception along with a user survey of 100+ participants that endorse our synthesis quality. Project page at https://mozhdehrouhsedaghat.github.io/magic.html. Code is available at https://github.com/mozhdehrouhsedaghat/magic
    Data-free Defense of Black Box Models Against Adversarial Attacks. (arXiv:2211.01579v2 [cs.LG] UPDATED)
    Several companies often safeguard their trained deep models (i.e., details of architecture, learnt weights, training details etc.) from third-party users by exposing them only as black boxes through APIs. Moreover, they may not even provide access to the training data due to proprietary reasons or sensitivity concerns. In this work, we propose a novel defense mechanism for black box models against adversarial attacks in a data-free set up. We construct synthetic data via generative model and train surrogate network using model stealing techniques. To minimize adversarial contamination on perturbed samples, we propose 'wavelet noise remover' (WNR) that performs discrete wavelet decomposition on input images and carefully select only a few important coefficients determined by our 'wavelet coefficient selection module' (WCSM). To recover the high-frequency content of the image after noise removal via WNR, we further train a 'regenerator' network with the objective of retrieving the coefficients such that the reconstructed image yields similar to original predictions on the surrogate model. At test time, WNR combined with trained regenerator network is prepended to the black box network, resulting in a high boost in adversarial accuracy. Our method improves the adversarial accuracy on CIFAR-10 by 38.98% and 32.01% on state-of-the-art Auto Attack compared to baseline, even when the attacker uses surrogate architecture (Alexnet-half and Alexnet) similar to the black box architecture (Alexnet) with same model stealing strategy as defender. The code is available at https://github.com/vcl-iisc/data-free-black-box-defense
    On-device modeling of user's social context and familiar places from smartphone-embedded sensor data. (arXiv:2205.08790v2 [cs.LG] UPDATED)
    Context modeling and recognition represent complex tasks that allow mobile and ubiquitous computing applications to adapt to the user's situation. Current solutions mainly focus on limited context information generally processed on centralized architectures, potentially exposing users' personal data to privacy leakage, and missing personalization features. For these reasons on-device context modeling and recognition represent the current research trend in this area. Among the different information characterizing the user's context in mobile environments, social interactions and visited locations remarkably contribute to the characterization of daily life scenarios. In this paper we propose a novel, unsupervised and lightweight approach to model the user's social context and her locations based on ego networks directly on the user mobile device. Relying on this model, the system is able to extract high-level and semantic-rich context features from smartphone-embedded sensors data. Specifically, for the social context it exploits data related to both physical and cyber social interactions among users and their devices. As far as location context is concerned, we assume that it is more relevant to model the familiarity degree of a specific location for the user's context than the raw location data, both in terms of GPS coordinates and proximity devices. By using 5 real-world datasets, we assess the structure of the social and location ego networks, we provide a semantic evaluation of the proposed models and a complexity evaluation in terms of mobile computing performance. Finally, we demonstrate the relevance of the extracted features by showing the performance of 3 machine learning algorithms to recognize daily-life situations, obtaining an improvement of 3% of AUROC, 9% of Precision, and 5% in terms of Recall with respect to use only features related to physical context.
    RL4CO: an Extensive Reinforcement Learning for Combinatorial Optimization Benchmark. (arXiv:2306.17100v1 [cs.LG])
    We introduce RL4CO, an extensive reinforcement learning (RL) for combinatorial optimization (CO) benchmark. RL4CO employs state-of-the-art software libraries as well as best practices in implementation, such as modularity and configuration management, to be efficient and easily modifiable by researchers for adaptations of neural network architecture, environments, and algorithms. Contrary to the existing focus on specific tasks like the traveling salesman problem (TSP) for performance assessment, we underline the importance of scalability and generalization capabilities for diverse optimization tasks. We also systematically benchmark sample efficiency, zero-shot generalization, and adaptability to changes in data distributions of various models. Our experiments show that some recent state-of-the-art methods fall behind their predecessors when evaluated using these new metrics, suggesting the necessity for a more balanced view of the performance of neural CO solvers. We hope RL4CO will encourage the exploration of novel solutions to complex real-world tasks, allowing to compare with existing methods through a standardized interface that decouples the science from the software engineering. We make our library publicly available at https://github.com/kaist-silab/rl4co.
    Safe Model-Based Multi-Agent Mean-Field Reinforcement Learning. (arXiv:2306.17052v1 [cs.LG])
    Many applications, e.g., in shared mobility, require coordinating a large number of agents. Mean-field reinforcement learning addresses the resulting scalability challenge by optimizing the policy of a representative agent. In this paper, we address an important generalization where there exist global constraints on the distribution of agents (e.g., requiring capacity constraints or minimum coverage requirements to be met). We propose Safe-$\text{M}^3$-UCRL, the first model-based algorithm that attains safe policies even in the case of unknown transition dynamics. As a key ingredient, it uses epistemic uncertainty in the transition model within a log-barrier approach to ensure pessimistic constraints satisfaction with high probability. We showcase Safe-$\text{M}^3$-UCRL on the vehicle repositioning problem faced by many shared mobility operators and evaluate its performance through simulations built on Shenzhen taxi trajectory data. Our algorithm effectively meets the demand in critical areas while ensuring service accessibility in regions with low demand.
    An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training. (arXiv:2306.17165v1 [cs.CV])
    We present a model that can perform multiple vision tasks and can be adapted to other downstream tasks efficiently. Despite considerable progress in multi-task learning, most efforts focus on learning from multi-label data: a single image set with multiple task labels. Such multi-label data sets are rare, small, and expensive. We say heterogeneous to refer to image sets with different task labels, or to combinations of single-task datasets. Few have explored training on such heterogeneous datasets. General-purpose vision models are still dominated by single-task pretraining, and it remains unclear how to scale up multi-task models by leveraging mainstream vision datasets designed for different purposes. The challenges lie in managing large intrinsic differences among vision tasks, including data distribution, architectures, task-specific modules, dataset scales, and sampling strategies. To address these challenges, we propose to modify and scale up mixture-of-experts (MoE) vision transformers, so that they can simultaneously learn classification, detection, and segmentation on diverse mainstream vision datasets including ImageNet, COCO, and ADE20K. Our approach achieves comparable results to single-task state-of-the-art models and demonstrates strong generalization on downstream tasks. Due to its emergent modularity, this general-purpose model decomposes into high-performing components, efficiently adapting to downstream tasks. We can fine-tune it with fewer training parameters, fewer model parameters, and less computation. Additionally, its modularity allows for easy expansion in continual-learning-without-forgetting scenarios. Finally, these functions can be controlled and combined to meet various demands of downstream tasks.
    Toward a Perspectivist Turn in Ground Truthing for Predictive Computing. (arXiv:2109.04270v3 [cs.LG] UPDATED)
    Most Artificial Intelligence applications are based on supervised machine learning (ML), which ultimately grounds on manually annotated data. The annotation process is often performed in terms of a majority vote and this has been proved to be often problematic, as highlighted by recent studies on the evaluation of ML models. In this article we describe and advocate for a different paradigm, which we call data perspectivism, which moves away from traditional gold standard datasets, towards the adoption of methods that integrate the opinions and perspectives of the human subjects involved in the knowledge representation step of ML processes. Drawing on previous works which inspired our proposal we describe the potential of our proposal for not only the more subjective tasks (e.g. those related to human language) but also to tasks commonly understood as objective (e.g. medical decision making), and present the main advantages of adopting a perspectivist stance in ML, as well as possible disadvantages, and various ways in which such a stance can be implemented in practice. Finally, we share a set of recommendations and outline a research agenda to advance the perspectivist stance in ML.
    ManimML: Communicating Machine Learning Architectures with Animation. (arXiv:2306.17108v1 [cs.LG])
    There has been an explosion in interest in machine learning (ML) in recent years due to its applications to science and engineering. However, as ML techniques have advanced, tools for explaining and visualizing novel ML algorithms have lagged behind. Animation has been shown to be a powerful tool for making engaging visualizations of systems that dynamically change over time, which makes it well suited to the task of communicating ML algorithms. However, the current approach to animating ML algorithms is to handcraft applications that highlight specific algorithms or use complex generalized animation software. We developed ManimML, an open-source Python library for easily generating animations of ML algorithms directly from code. We sought to leverage ML practitioners' preexisting knowledge of programming rather than requiring them to learn complex animation software. ManimML has a familiar syntax for specifying neural networks that mimics popular deep learning frameworks like Pytorch. A user can take a preexisting neural network architecture and easily write a specification for an animation in ManimML, which will then automatically compose animations for different components of the system into a final animation of the entire neural network. ManimML is open source and available at https://github.com/helblazer811/ManimML.
    Surgical Phase and Instrument Recognition: How to identify appropriate Dataset Splits. (arXiv:2306.16879v1 [cs.LG])
    Purpose: The development of machine learning models for surgical workflow and instrument recognition from temporal data represents a challenging task due to the complex nature of surgical workflows. In particular, the imbalanced distribution of data is one of the major challenges in the domain of surgical workflow recognition. In order to obtain meaningful results, careful partitioning of data into training, validation, and test sets, as well as the selection of suitable evaluation metrics are crucial. Methods: In this work, we present an openly available web-based application that enables interactive exploration of dataset partitions. The proposed visual framework facilitates the assessment of dataset splits for surgical workflow recognition, especially with regard to identifying sub-optimal dataset splits. Currently, it supports visualization of surgical phase and instrument annotations. Results: In order to validate the dedicated interactive visualizations, we use a dataset split of the Cholec80 dataset. This dataset split was specifically selected to reflect a case of strong data imbalance. Using our software, we were able to identify phases, phase transitions, and combinations of surgical instruments that were not represented in one of the sets. Conclusion: In order to obtain meaningful results in highly unbalanced class distributions, special care should be taken with respect to the selection of an appropriate split. Interactive data visualization represents a promising approach for the assessment of machine learning datasets. The source code is available at https://github.com/Cardio-AI/endovis-ml
    End-to-end Autonomous Driving: Challenges and Frontiers. (arXiv:2306.16927v1 [cs.RO])
    The autonomous driving community has witnessed a rapid growth in approaches that embrace an end-to-end algorithm framework, utilizing raw sensor input to generate vehicle motion plans, instead of concentrating on individual tasks such as detection and motion prediction. End-to-end systems, in comparison to modular pipelines, benefit from joint feature optimization for perception and planning. This field has flourished due to the availability of large-scale datasets, closed-loop evaluation, and the increasing need for autonomous driving algorithms to perform effectively in challenging scenarios. In this survey, we provide a comprehensive analysis of more than 250 papers, covering the motivation, roadmap, methodology, challenges, and future trends in end-to-end autonomous driving. We delve into several critical challenges, including multi-modality, interpretability, causal confusion, robustness, and world models, amongst others. Additionally, we discuss current advancements in foundation models and visual pre-training, as well as how to incorporate these techniques within the end-to-end driving framework. To facilitate future research, we maintain an active repository that contains up-to-date links to relevant literature and open-source projects at https://github.com/OpenDriveLab/End-to-end-Autonomous-Driving.
    The ELM Neuron: an Efficient and Expressive Cortical Neuron Model Can Solve Long-Horizon Tasks. (arXiv:2306.16922v1 [cs.NE])
    Traditional large-scale neuroscience models and machine learning utilize simplified models of individual neurons, relying on collective activity and properly adjusted connections to perform complex computations. However, each biological cortical neuron is inherently a sophisticated computational device, as corroborated in a recent study where it took a deep artificial neural network with millions of parameters to replicate the input-output relationship of a detailed biophysical model of a cortical pyramidal neuron. We question the necessity for these many parameters and introduce the Expressive Leaky Memory (ELM) neuron, a biologically inspired, computationally expressive, yet efficient model of a cortical neuron. Remarkably, our ELM neuron requires only 8K trainable parameters to match the aforementioned input-output relationship accurately. We find that an accurate model necessitates multiple memory-like hidden states and intricate nonlinear synaptic integration. To assess the computational ramifications of this design, we evaluate the ELM neuron on various tasks with demanding temporal structures, including a sequential version of the CIFAR-10 classification task, the challenging Pathfinder-X task, and a new dataset based on the Spiking Heidelberg Digits dataset. Our ELM neuron outperforms most transformer-based models on the Pathfinder-X task with 77% accuracy, demonstrates competitive performance on Sequential CIFAR-10, and superior performance compared to classic LSTM models on the variant of the Spiking Heidelberg Digits dataset. These findings indicate a potential for biologically motivated, computationally efficient neuronal models to enhance performance in challenging machine learning tasks.
    NAUTILUS: boosting Bayesian importance nested sampling with deep learning. (arXiv:2306.16923v1 [astro-ph.IM])
    We introduce a novel approach to boost the efficiency of the importance nested sampling (INS) technique for Bayesian posterior and evidence estimation using deep learning. Unlike rejection-based sampling methods such as vanilla nested sampling (NS) or Markov chain Monte Carlo (MCMC) algorithms, importance sampling techniques can use all likelihood evaluations for posterior and evidence estimation. However, for efficient importance sampling, one needs proposal distributions that closely mimic the posterior distributions. We show how to combine INS with deep learning via neural network regression to accomplish this task. We also introduce NAUTILUS, a reference open-source Python implementation of this technique for Bayesian posterior and evidence estimation. We compare NAUTILUS against popular NS and MCMC packages, including EMCEE, DYNESTY, ULTRANEST and POCOMC, on a variety of challenging synthetic problems and real-world applications in exoplanet detection, galaxy SED fitting and cosmology. In all applications, the sampling efficiency of NAUTILUS is substantially higher than that of all other samplers, often by more than an order of magnitude. Simultaneously, NAUTILUS delivers highly accurate results and needs fewer likelihood evaluations than all other samplers tested. We also show that NAUTILUS has good scaling with the dimensionality of the likelihood and is easily parallelizable to many CPUs.
    Macro Placement by Wire-Mask-Guided Black-Box Optimization. (arXiv:2306.16844v1 [cs.LG])
    The development of very large-scale integration (VLSI) technology has posed new challenges for electronic design automation (EDA) techniques in chip floorplanning. During this process, macro placement is an important subproblem, which tries to determine the positions of all macros with the aim of minimizing half-perimeter wirelength (HPWL) and avoiding overlapping. Previous methods include packing-based, analytical and reinforcement learning methods. In this paper, we propose a new black-box optimization (BBO) framework (called WireMask-BBO) for macro placement, by using a wire-mask-guided greedy procedure for objective evaluation. Equipped with different BBO algorithms, WireMask-BBO empirically achieves significant improvements over previous methods, i.e., achieves significantly shorter HPWL by using much less time. Furthermore, it can fine-tune existing placements by treating them as initial solutions, which can bring up to 50% improvement in HPWL. WireMask-BBO has the potential to significantly improve the quality and efficiency of chip floorplanning, which makes it appealing to researchers and practitioners in EDA and will also promote the application of BBO.
    Spectral Batch Normalization: Normalization in the Frequency Domain. (arXiv:2306.16999v1 [cs.CV])
    Regularization is a set of techniques that are used to improve the generalization ability of deep neural networks. In this paper, we introduce spectral batch normalization (SBN), a novel effective method to improve generalization by normalizing feature maps in the frequency (spectral) domain. The activations of residual networks without batch normalization (BN) tend to explode exponentially in the depth of the network at initialization. This leads to extremely large feature map norms even though the parameters are relatively small. These explosive dynamics can be very detrimental to learning. BN makes weight decay regularization on the scaling factors $\gamma, \beta$ approximately equivalent to an additive penalty on the norm of the feature maps, which prevents extremely large feature map norms to a certain degree. However, we show experimentally that, despite the approximate additive penalty of BN, feature maps in deep neural networks (DNNs) tend to explode at the beginning of the network and that feature maps of DNNs contain large values during the whole training. This phenomenon also occurs in a weakened form in non-residual networks. SBN addresses large feature maps by normalizing them in the frequency domain. In our experiments, we empirically show that SBN prevents exploding feature maps at initialization and large feature map values during the training. Moreover, the normalization of feature maps in the frequency domain leads to more uniform distributed frequency components. This discourages the DNNs to rely on single frequency components of feature maps. These, together with other effects of SBN, have a regularizing effect on the training of residual and non-residual networks. We show experimentally that using SBN in addition to standard regularization methods improves the performance of DNNs by a relevant margin, e.g. ResNet50 on ImageNet by 0.71%.
    milliFlow: Scene Flow Estimation on mmWave Radar Point Cloud for Human Motion Sensing. (arXiv:2306.17010v1 [cs.CV])
    Approaching the era of ubiquitous computing, human motion sensing plays a crucial role in smart systems for decision making, user interaction, and personalized services. Extensive research has been conducted on human tracking, pose estimation, gesture recognition, and activity recognition, which are predominantly based on cameras in traditional methods. However, the intrusive nature of cameras limits their use in smart home applications. To address this, mmWave radars have gained popularity due to their privacy-friendly features. In this work, we propose \textit{milliFlow}, a novel deep learning method for scene flow estimation as a complementary motion information for mmWave point cloud, serving as an intermediate level of features and directly benefiting downstream human motion sensing tasks. Experimental results demonstrate the superior performance of our method with an average 3D endpoint error of 4.6cm, significantly surpassing the competing approaches. Furthermore, by incorporating scene flow information, we achieve remarkable improvements in human activity recognition, human parsing, and human body part tracking. To foster further research in this area, we provide our codebase and dataset for open access.
    Graph Sampling-based Meta-Learning for Molecular Property Prediction. (arXiv:2306.16780v1 [cs.LG])
    Molecular property is usually observed with a limited number of samples, and researchers have considered property prediction as a few-shot problem. One important fact that has been ignored by prior works is that each molecule can be recorded with several different properties simultaneously. To effectively utilize many-to-many correlations of molecules and properties, we propose a Graph Sampling-based Meta-learning (GS-Meta) framework for few-shot molecular property prediction. First, we construct a Molecule-Property relation Graph (MPG): molecule and properties are nodes, while property labels decide edges. Then, to utilize the topological information of MPG, we reformulate an episode in meta-learning as a subgraph of the MPG, containing a target property node, molecule nodes, and auxiliary property nodes. Third, as episodes in the form of subgraphs are no longer independent of each other, we propose to schedule the subgraph sampling process with a contrastive loss function, which considers the consistency and discrimination of subgraphs. Extensive experiments on 5 commonly-used benchmarks show GS-Meta consistently outperforms state-of-the-art methods by 5.71%-6.93% in ROC-AUC and verify the effectiveness of each proposed module. Our code is available at https://github.com/HICAI-ZJU/GS-Meta.
    Answer Mining from a Pool of Images: Towards Retrieval-Based Visual Question Answering. (arXiv:2306.16713v1 [cs.CV])
    We study visual question answering in a setting where the answer has to be mined from a pool of relevant and irrelevant images given as a context. For such a setting, a model must first retrieve relevant images from the pool and answer the question from these retrieved images. We refer to this problem as retrieval-based visual question answering (or RETVQA in short). The RETVQA is distinctively different and more challenging than the traditionally-studied Visual Question Answering (VQA), where a given question has to be answered with a single relevant image in context. Towards solving the RETVQA task, we propose a unified Multi Image BART (MI-BART) that takes a question and retrieved images using our relevance encoder for free-form fluent answer generation. Further, we introduce the largest dataset in this space, namely RETVQA, which has the following salient features: multi-image and retrieval requirement for VQA, metadata-independent questions over a pool of heterogeneous images, expecting a mix of classification-oriented and open-ended generative answers. Our proposed framework achieves an accuracy of 76.5% and a fluency of 79.3% on the proposed dataset, namely RETVQA and also outperforms state-of-the-art methods by 4.9% and 11.8% on the image segment of the publicly available WebQA dataset on the accuracy and fluency metrics, respectively.
    CLIPAG: Towards Generator-Free Text-to-Image Generation. (arXiv:2306.16805v1 [cs.CV])
    Perceptually Aligned Gradients (PAG) refer to an intriguing property observed in robust image classification models, wherein their input gradients align with human perception and pose semantic meanings. While this phenomenon has gained significant research attention, it was solely studied in the context of unimodal vision-only architectures. In this work, we extend the study of PAG to Vision-Language architectures, which form the foundations for diverse image-text tasks and applications. Through an adversarial robustification finetuning of CLIP, we demonstrate that robust Vision-Language models exhibit PAG in contrast to their vanilla counterparts. This work reveals the merits of CLIP with PAG (CLIPAG) in several vision-language generative tasks. Notably, we show that seamlessly integrating CLIPAG in a "plug-n-play" manner leads to substantial improvements in vision-language generative applications. Furthermore, leveraging its PAG property, CLIPAG enables text-to-image generation without any generative model, which typically requires huge generators.
    Diffusion-Jump GNNs: Homophiliation via Learnable Metric Filters. (arXiv:2306.16976v1 [cs.LG])
    High-order Graph Neural Networks (HO-GNNs) have been developed to infer consistent latent spaces in the heterophilic regime, where the label distribution is not correlated with the graph structure. However, most of the existing HO-GNNs are hop-based, i.e., they rely on the powers of the transition matrix. As a result, these architectures are not fully reactive to the classification loss and the achieved structural filters have static supports. In other words, neither the filters' supports nor their coefficients can be learned with these networks. They are confined, instead, to learn combinations of filters. To address the above concerns, we propose Diffusion-jump GNNs a method relying on asymptotic diffusion distances that operates on jumps. A diffusion-pump generates pairwise distances whose projections determine both the support and coefficients of each structural filter. These filters are called jumps because they explore a wide range of scales in order to find bonds between scattered nodes with the same label. Actually, the full process is controlled by the classification loss. Both the jumps and the diffusion distances react to classification errors (i.e. they are learnable). Homophiliation, i.e., the process of learning piecewise smooth latent spaces in the heterophilic regime, is formulated as a Dirichlet problem: the known labels determine the border nodes and the diffusion-pump ensures a minimal deviation of the semi-supervised grouping from a canonical unsupervised grouping. This triggers the update of both the diffusion distances and, consequently, the jumps in order to minimize the classification error. The Dirichlet formulation has several advantages. It leads to the definition of structural heterophily, a novel measure beyond edge heterophily. It also allows us to investigate links with (learnable) diffusion distances, absorbing random walks and stochastic diffusion.
    Weight Compander: A Simple Weight Reparameterization for Regularization. (arXiv:2306.16993v1 [cs.LG])
    Regularization is a set of techniques that are used to improve the generalization ability of deep neural networks. In this paper, we introduce weight compander (WC), a novel effective method to improve generalization by reparameterizing each weight in deep neural networks using a nonlinear function. It is a general, intuitive, cheap and easy to implement method, which can be combined with various other regularization techniques. Large weights in deep neural networks are a sign of a more complex network that is overfitted to the training data. Moreover, regularized networks tend to have a greater range of weights around zero with fewer weights centered at zero. We introduce a weight reparameterization function which is applied to each weight and implicitly reduces overfitting by restricting the magnitude of the weights while forcing them away from zero at the same time. This leads to a more democratic decision-making in the network. Firstly, individual weights cannot have too much influence in the prediction process due to the restriction of their magnitude. Secondly, more weights are used in the prediction process, since they are forced away from zero during the training. This promotes the extraction of more features from the input data and increases the level of weight redundancy, which makes the network less sensitive to statistical differences between training and test data. We extend our method to learn the hyperparameters of the introduced weight reparameterization function. This avoids hyperparameter search and gives the network the opportunity to align the weight reparameterization with the training progress. We show experimentally that using weight compander in addition to standard regularization methods improves the performance of neural networks.
    Safety-Aware Task Composition for Discrete and Continuous Reinforcement Learning. (arXiv:2306.17033v1 [cs.LG])
    Compositionality is a critical aspect of scalable system design. Reinforcement learning (RL) has recently shown substantial success in task learning, but has only recently begun to truly leverage composition. In this paper, we focus on Boolean composition of learned tasks as opposed to functional or sequential composition. Existing Boolean composition for RL focuses on reaching a satisfying absorbing state in environments with discrete action spaces, but does not support composable safety (i.e., avoidance) constraints. We advance the state of the art in Boolean composition of learned tasks with three contributions: i) introduce two distinct notions of safety in this framework; ii) show how to enforce either safety semantics, prove correctness (under some assumptions), and analyze the trade-offs between the two safety notions; and iii) extend Boolean composition from discrete action spaces to continuous action spaces. We demonstrate these techniques using modified versions of value iteration in a grid world, Deep Q-Network (DQN) in a grid world with image observations, and Twin Delayed DDPG (TD3) in a continuous-observation and continuous-action Bullet physics environment. We believe that these contributions advance the theory of safe reinforcement learning by allowing zero-shot composition of policies satisfying safety properties.
    Traceable Group-Wise Self-Optimizing Feature Transformation Learning: A Dual Optimization Perspective. (arXiv:2306.16893v1 [cs.LG])
    Feature transformation aims to reconstruct an effective representation space by mathematically refining the existing features. It serves as a pivotal approach to combat the curse of dimensionality, enhance model generalization, mitigate data sparsity, and extend the applicability of classical models. Existing research predominantly focuses on domain knowledge-based feature engineering or learning latent representations. However, these methods, while insightful, lack full automation and fail to yield a traceable and optimal representation space. An indispensable question arises: Can we concurrently address these limitations when reconstructing a feature space for a machine-learning task? Our initial work took a pioneering step towards this challenge by introducing a novel self-optimizing framework. This framework leverages the power of three cascading reinforced agents to automatically select candidate features and operations for generating improved feature transformation combinations. Despite the impressive strides made, there was room for enhancing its effectiveness and generalization capability. In this extended journal version, we advance our initial work from two distinct yet interconnected perspectives: 1) We propose a refinement of the original framework, which integrates a graph-based state representation method to capture the feature interactions more effectively and develop different Q-learning strategies to alleviate Q-value overestimation further. 2) We utilize a new optimization technique (actor-critic) to train the entire self-optimizing framework in order to accelerate the model convergence and improve the feature transformation performance. Finally, to validate the improved effectiveness and generalization capability of our framework, we perform extensive experiments and conduct comprehensive analyses.
    Gesture Recognition with mmWave Wi-Fi Access Points: Lessons Learned. (arXiv:2306.17062v1 [cs.LG])
    In recent years, channel state information (CSI) at sub-6 GHz has been widely exploited for Wi-Fi sensing, particularly for activity and gesture recognition. In this work, we instead explore mmWave (60 GHz) Wi-Fi signals for gesture recognition/pose estimation. Our focus is on the mmWave Wi-Fi signals so that they can be used not only for high data rate communication but also for improved sensing e.g., for extended reality (XR) applications. For this reason, we extract spatial beam signal-to-noise ratios (SNRs) from the periodic beam training employed by IEEE 802.11ad devices. We consider a set of 10 gestures/poses motivated by XR applications. We conduct experiments in two environments and with three people.As a comparison, we also collect CSI from IEEE 802.11ac devices. To extract features from the CSI and the beam SNR, we leverage a deep neural network (DNN). The DNN classifier achieves promising results on the beam SNR task with state-of-the-art 96.7% accuracy in a single environment, even with a limited dataset. We also investigate the robustness of the beam SNR against CSI across different environments. Our experiments reveal that features from the CSI generalize without additional re-training, while those from beam SNRs do not. Therefore, re-training is required in the latter case.
    SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores. (arXiv:2306.16688v1 [cs.DC])
    The ever-growing complexity of reinforcement learning (RL) tasks demands a distributed RL system to efficiently generate and process a massive amount of data to train intelligent agents. However, existing open-source libraries suffer from various limitations, which impede their practical use in challenging scenarios where large-scale training is necessary. While industrial systems from OpenAI and DeepMind have achieved successful large-scale RL training, their system architecture and implementation details remain undisclosed to the community. In this paper, we present a novel abstraction on the dataflows of RL training, which unifies practical RL training across diverse applications into a general framework and enables fine-grained optimizations. Following this abstraction, we develop a scalable, efficient, and extensible distributed RL system called ReaLly Scalable RL (SRL). The system architecture of SRL separates major RL computation components and allows massively parallelized training. Moreover, SRL offers user-friendly and extensible interfaces for customized algorithms. Our evaluation shows that SRL outperforms existing academic libraries in both a single machine and a medium-sized cluster. In a large-scale cluster, the novel architecture of SRL leads to up to 3.7x speedup compared to the design choices adopted by the existing libraries. We also conduct a direct benchmark comparison to OpenAI's industrial system, Rapid, in the challenging hide-and-seek environment. SRL reproduces the same solution as reported by OpenAI with up to 5x speedup in wall-clock time. Furthermore, we also examine the performance of SRL in a much harder variant of the hide-and-seek environment and achieve substantial learning speedup by scaling SRL to over 15k CPU cores and 32 A100 GPUs. Notably, SRL is the first in the academic community to perform RL experiments at such a large scale.
    Provable Advantage of Curriculum Learning on Parity Targets with Mixed Inputs. (arXiv:2306.16921v1 [cs.LG])
    Experimental results have shown that curriculum learning, i.e., presenting simpler examples before more complex ones, can improve the efficiency of learning. Some recent theoretical results also showed that changing the sampling distribution can help neural networks learn parities, with formal results only for large learning rates and one-step arguments. Here we show a separation result in the number of training steps with standard (bounded) learning rates on a common sample distribution: if the data distribution is a mixture of sparse and dense inputs, there exists a regime in which a 2-layer ReLU neural network trained by a curriculum noisy-GD (or SGD) algorithm that uses sparse examples first, can learn parities of sufficiently large degree, while any fully connected neural network of possibly larger width or depth trained by noisy-GD on the unordered samples cannot learn without additional steps. We also provide experimental results supporting the qualitative separation beyond the specific regime of the theoretical results.
    Joint Level Generation and Translation Using Gameplay Videos. (arXiv:2306.16662v1 [cs.CV])
    Procedural Content Generation via Machine Learning (PCGML) faces a significant hurdle that sets it apart from other fields, such as image or text generation, which is limited annotated data. Many existing methods for procedural level generation via machine learning require a secondary representation besides level images. However, the current methods for obtaining such representations are laborious and time-consuming, which contributes to this problem. In this work, we aim to address this problem by utilizing gameplay videos of two human-annotated games to develop a novel multi-tail framework that learns to perform simultaneous level translation and generation. The translation tail of our framework can convert gameplay video frames to an equivalent secondary representation, while its generation tail can produce novel level segments. Evaluation results and comparisons between our framework and baselines suggest that combining the level generation and translation tasks can lead to an overall improved performance regarding both tasks. This represents a possible solution to limited annotated level data, and we demonstrate the potential for future versions to generalize to unseen games.
    AutoML in Heavily Constrained Applications. (arXiv:2306.16913v1 [cs.LG])
    Optimizing a machine learning pipeline for a task at hand requires careful configuration of various hyperparameters, typically supported by an AutoML system that optimizes the hyperparameters for the given training dataset. Yet, depending on the AutoML system's own second-order meta-configuration, the performance of the AutoML process can vary significantly. Current AutoML systems cannot automatically adapt their own configuration to a specific use case. Further, they cannot compile user-defined application constraints on the effectiveness and efficiency of the pipeline and its generation. In this paper, we propose Caml, which uses meta-learning to automatically adapt its own AutoML parameters, such as the search strategy, the validation strategy, and the search space, for a task at hand. The dynamic AutoML strategy of Caml takes user-defined constraints into account and obtains constraint-satisfying pipelines with high predictive performance.
    The Drunkard's Odometry: Estimating Camera Motion in Deforming Scenes. (arXiv:2306.16917v1 [cs.CV])
    Estimating camera motion in deformable scenes poses a complex and open research challenge. Most existing non-rigid structure from motion techniques assume to observe also static scene parts besides deforming scene parts in order to establish an anchoring reference. However, this assumption does not hold true in certain relevant application cases such as endoscopies. Deformable odometry and SLAM pipelines, which tackle the most challenging scenario of exploratory trajectories, suffer from a lack of robustness and proper quantitative evaluation methodologies. To tackle this issue with a common benchmark, we introduce the Drunkard's Dataset, a challenging collection of synthetic data targeting visual navigation and reconstruction in deformable environments. This dataset is the first large set of exploratory camera trajectories with ground truth inside 3D scenes where every surface exhibits non-rigid deformations over time. Simulations in realistic 3D buildings lets us obtain a vast amount of data and ground truth labels, including camera poses, RGB images and depth, optical flow and normal maps at high resolution and quality. We further present a novel deformable odometry method, dubbed the Drunkard's Odometry, which decomposes optical flow estimates into rigid-body camera motion and non-rigid scene deformations. In order to validate our data, our work contains an evaluation of several baselines as well as a novel tracking error metric which does not require ground truth data. Dataset and code: https://davidrecasens.github.io/TheDrunkard'sOdometry/
    Applying language models to algebraic topology: generating simplicial cycles using multi-labeling in Wu's formula. (arXiv:2306.16951v1 [math.AT])
    Computing homotopy groups of spheres has long been a fundamental objective in algebraic topology. Various theoretical and algorithmic approaches have been developed to tackle this problem. In this paper we take a step towards the goal of comprehending the group-theoretic structure of the generators of these homotopy groups by leveraging the power of machine learning. Specifically, in the simplicial group setting of Wu's formula, we reformulate the problem of generating simplicial cycles as a problem of sampling from the intersection of algorithmic datasets related to Dyck languages. We present and evaluate language modelling approaches that employ multi-label information for input sequences, along with the necessary group-theoretic toolkit and non-neural baselines.
    Policy Space Diversity for Non-Transitive Games. (arXiv:2306.16884v1 [cs.GT])
    Policy-Space Response Oracles (PSRO) is an influential algorithm framework for approximating a Nash Equilibrium (NE) in multi-agent non-transitive games. Many previous studies have been trying to promote policy diversity in PSRO. A major weakness in existing diversity metrics is that a more diverse (according to their diversity metrics) population does not necessarily mean (as we proved in the paper) a better approximation to a NE. To alleviate this problem, we propose a new diversity metric, the improvement of which guarantees a better approximation to a NE. Meanwhile, we develop a practical and well-justified method to optimize our diversity metric using only state-action samples. By incorporating our diversity regularization into the best response solving in PSRO, we obtain a new PSRO variant, Policy Space Diversity PSRO (PSD-PSRO). We present the convergence property of PSD-PSRO. Empirically, extensive experiments on various games demonstrate that PSD-PSRO is more effective in producing significantly less exploitable policies than state-of-the-art PSRO variants.
    Transfer Learning with Semi-Supervised Dataset Annotation for Birdcall Classification. (arXiv:2306.16760v1 [cs.SD])
    We present working notes on transfer learning with semi-supervised dataset annotation for the BirdCLEF 2023 competition, focused on identifying African bird species in recorded soundscapes. Our approach utilizes existing off-the-shelf models, BirdNET and MixIT, to address representation and labeling challenges in the competition. We explore the embedding space learned by BirdNET and propose a process to derive an annotated dataset for supervised learning. Our experiments involve various models and feature engineering approaches to maximize performance on the competition leaderboard. The results demonstrate the effectiveness of our approach in classifying bird species and highlight the potential of transfer learning and semi-supervised dataset annotation in similar tasks.
    Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging. (arXiv:2306.16788v1 [cs.LG])
    Neural networks can be significantly compressed by pruning, leading to sparse models requiring considerably less storage and floating-point operations while maintaining predictive performance. Model soups (Wortsman et al., 2022) improve generalization and out-of-distribution performance by averaging the parameters of multiple models into a single one without increased inference time. However, identifying models in the same loss basin to leverage both sparsity and parameter averaging is challenging, as averaging arbitrary sparse models reduces the overall sparsity due to differing sparse connectivities. In this work, we address these challenges by demonstrating that exploring a single retraining phase of Iterative Magnitude Pruning (IMP) with varying hyperparameter configurations, such as batch ordering or weight decay, produces models that are suitable for averaging and share the same sparse connectivity by design. Averaging these models significantly enhances generalization performance compared to their individual components. Building on this idea, we introduce Sparse Model Soups (SMS), a novel method for merging sparse models by initiating each prune-retrain cycle with the averaged model of the previous phase. SMS maintains sparsity, exploits sparse network benefits being modular and fully parallelizable, and substantially improves IMP's performance. Additionally, we demonstrate that SMS can be adapted to enhance the performance of state-of-the-art pruning during training approaches.
    Query-based Hard-Image Retrieval for Object Detection at Test Time. (arXiv:2209.11559v2 [cs.CV] UPDATED)
    There is a longstanding interest in capturing the error behaviour of object detectors by finding images where their performance is likely to be unsatisfactory. In real-world applications such as autonomous driving, it is also crucial to characterise potential failures beyond simple requirements of detection performance. For example, a missed detection of a pedestrian close to an ego vehicle will generally require closer inspection than a missed detection of a car in the distance. The problem of predicting such potential failures at test time has largely been overlooked in the literature and conventional approaches based on detection uncertainty fall short in that they are agnostic to such fine-grained characterisation of errors. In this work, we propose to reformulate the problem of finding "hard" images as a query-based hard image retrieval task, where queries are specific definitions of "hardness", and offer a simple and intuitive method that can solve this task for a large family of queries. Our method is entirely post-hoc, does not require ground-truth annotations, is independent of the choice of a detector, and relies on an efficient Monte Carlo estimation that uses a simple stochastic model in place of the ground-truth. We show experimentally that it can be applied successfully to a wide variety of queries for which it can reliably identify hard images for a given detector without any labelled data. We provide results on ranking and classification tasks using the widely used RetinaNet, Faster-RCNN, Mask-RCNN, and Cascade Mask-RCNN object detectors. The code for this project is available at https://github.com/fiveai/hardest.
    Improving Online Continual Learning Performance and Stability with Temporal Ensembles. (arXiv:2306.16817v1 [cs.LG])
    Neural networks are very effective when trained on large datasets for a large number of iterations. However, when they are trained on non-stationary streams of data and in an online fashion, their performance is reduced (1) by the online setup, which limits the availability of data, (2) due to catastrophic forgetting because of the non-stationary nature of the data. Furthermore, several recent works (Caccia et al., 2022; Lange et al., 2023) arXiv:2205.1345(2) showed that replay methods used in continual learning suffer from the stability gap, encountered when evaluating the model continually (rather than only on task boundaries). In this article, we study the effect of model ensembling as a way to improve performance and stability in online continual learning. We notice that naively ensembling models coming from a variety of training tasks increases the performance in online continual learning considerably. Starting from this observation, and drawing inspirations from semi-supervised learning ensembling methods, we use a lightweight temporal ensemble that computes the exponential moving average of the weights (EMA) at test time, and show that it can drastically increase the performance and stability when used in combination with several methods from the literature.
    Private Covariance Approximation and Eigenvalue-Gap Bounds for Complex Gaussian Perturbations. (arXiv:2306.16648v1 [cs.DS])
    We consider the problem of approximating a $d \times d$ covariance matrix $M$ with a rank-$k$ matrix under $(\varepsilon,\delta)$-differential privacy. We present and analyze a complex variant of the Gaussian mechanism and show that the Frobenius norm of the difference between the matrix output by this mechanism and the best rank-$k$ approximation to $M$ is bounded by roughly $\tilde{O}(\sqrt{kd})$, whenever there is an appropriately large gap between the $k$'th and the $k+1$'th eigenvalues of $M$. This improves on previous work that requires that the gap between every pair of top-$k$ eigenvalues of $M$ is at least $\sqrt{d}$ for a similar bound. Our analysis leverages the fact that the eigenvalues of complex matrix Brownian motion repel more than in the real case, and uses Dyson's stochastic differential equations governing the evolution of its eigenvalues to show that the eigenvalues of the matrix $M$ perturbed by complex Gaussian noise have large gaps with high probability. Our results contribute to the analysis of low-rank approximations under average-case perturbations and to an understanding of eigenvalue gaps for random matrices, which may be of independent interest.
    Eigensubspace of Temporal-Difference Dynamics and How It Improves Value Approximation in Reinforcement Learning. (arXiv:2306.16750v1 [cs.LG])
    We propose a novel value approximation method, namely Eigensubspace Regularized Critic (ERC) for deep reinforcement learning (RL). ERC is motivated by an analysis of the dynamics of Q-value approximation error in the Temporal-Difference (TD) method, which follows a path defined by the 1-eigensubspace of the transition kernel associated with the Markov Decision Process (MDP). It reveals a fundamental property of TD learning that has remained unused in previous deep RL approaches. In ERC, we propose a regularizer that guides the approximation error tending towards the 1-eigensubspace, resulting in a more efficient and stable path of value approximation. Moreover, we theoretically prove the convergence of the ERC method. Besides, theoretical analysis and experiments demonstrate that ERC effectively reduces the variance of value functions. Among 26 tasks in the DMControl benchmark, ERC outperforms state-of-the-art methods for 20. Besides, it shows significant advantages in Q-value approximation and variance reduction. Our code is available at https://sites.google.com/view/erc-ecml23/.
    Length of Stay prediction for Hospital Management using Domain Adaptation. (arXiv:2306.16823v1 [cs.LG])
    Inpatient length of stay (LoS) is an important managerial metric which if known in advance can be used to efficiently plan admissions, allocate resources and improve care. Using historical patient data and machine learning techniques, LoS prediction models can be developed. Ethically, these models can not be used for patient discharge in lieu of unit heads but are of utmost necessity for hospital management systems in charge of effective hospital planning. Therefore, the design of the prediction system should be adapted to work in a true hospital setting. In this study, we predict early hospital LoS at the granular level of admission units by applying domain adaptation to leverage information learned from a potential source domain. Time-varying data from 110,079 and 60,492 patient stays to 8 and 9 intensive care units were respectively extracted from eICU-CRD and MIMIC-IV. These were fed into a Long-Short Term Memory and a Fully connected network to train a source domain model, the weights of which were transferred either partially or fully to initiate training in target domains. Shapley Additive exPlanations (SHAP) algorithms were used to study the effect of weight transfer on model explanability. Compared to the benchmark, the proposed weight transfer model showed statistically significant gains in prediction accuracy (between 1% and 5%) as well as computation time (up to 2hrs) for some target domains. The proposed method thus provides an adapted clinical decision support system for hospital management that can ease processes of data access via ethical committee, computation infrastructures and time.
    An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs. (arXiv:2306.16601v1 [cs.LG])
    In recent years, Transformer-based language models have become the standard approach for natural language processing tasks. However, stringent throughput and latency requirements in industrial applications are limiting their adoption. To mitigate the gap, model compression techniques such as structured pruning are being used to improve inference efficiency. However, most existing neural network inference runtimes lack adequate support for structured sparsity. In this paper, we propose an efficient sparse deep learning inference software stack for Transformer-based language models where the weights are pruned with constant block size. Our sparse software accelerator leverages Intel Deep Learning Boost to maximize the performance of sparse matrix - dense matrix multiplication (commonly abbreviated as SpMM) on CPUs. Our SpMM kernel outperforms the existing sparse libraries (oneMKL, TVM, and LIBXSMM) by an order of magnitude on a wide range of GEMM shapes under 5 representative sparsity ratios (70%, 75%, 80%, 85%, 90%). Moreover, our SpMM kernel shows up to 5x speedup over dense GEMM kernel of oneDNN, a well-optimized dense library widely used in industry. We apply our sparse accelerator on widely-used Transformer-based language models including Bert-Mini, DistilBERT, Bert-Base, and BERT-Large. Our sparse inference software shows up to 1.5x speedup over Neural Magic's Deepsparse under same configurations on Xeon on Amazon Web Services under proxy production latency constraints. We also compare our solution with two framework-based inference solutions, ONNX Runtime and PyTorch, and demonstrate up to 37x speedup over ONNX Runtime and 345x over PyTorch on Xeon under the latency constraints. All the source code is publicly available on Github: https://github.com/intel/intel-extension-for-transformers.
    Learning from Synthetic Human Group Activities. (arXiv:2306.16772v1 [cs.CV])
    The understanding of complex human interactions and group activities has garnered attention in human-centric computer vision. However, the advancement of the related tasks is hindered due to the difficulty of obtaining large-scale labeled real-world datasets. To mitigate the issue, we propose M3Act, a multi-view multi-group multi-person human atomic action and group activity data generator. Powered by the Unity engine, M3Act contains simulation-ready 3D scenes and human assets, configurable lighting and camera systems, highly parameterized modular group activities, and a large degree of domain randomization during the data generation process. Our data generator is capable of generating large-scale datasets of human activities with multiple viewpoints, modalities (RGB images, 2D poses, 3D motions), and high-quality annotations for individual persons and multi-person groups (2D bounding boxes, instance segmentation masks, individual actions and group activity categories). Using M3Act, we perform synthetic data pre-training for 2D skeleton-based group activity recognition and RGB-based multi-person pose tracking. The results indicate that learning from our synthetic datasets largely improves the model performances on real-world datasets, with the highest gain of 5.59% and 7.32% respectively in group and person recognition accuracy on CAD2, as well as an improvement of 6.63 in MOTP on HiEve. Pre-training with our synthetic data also leads to faster model convergence on downstream tasks (up to 6.8% faster). Moreover, M3Act opens new research problems for 3D group activity generation. We release M3Act3D, an 87.6-hour 3D motion dataset of human activities with larger group sizes and higher complexity of inter-person interactions than previous multi-person datasets. We define multiple metrics and propose a competitive baseline for the novel task.
    OSP: Boosting Distributed Model Training with 2-stage Synchronization. (arXiv:2306.16926v1 [cs.DC])
    Distributed deep learning (DDL) is a promising research area, which aims to increase the efficiency of training deep learning tasks with large size of datasets and models. As the computation capability of DDL nodes continues to increase, the network connection between nodes is becoming a major bottleneck. Various methods of gradient compression and improved model synchronization have been proposed to address this bottleneck in Parameter-Server-based DDL. However, these two types of methods can result in accuracy loss due to discarded gradients and have limited enhancement on the throughput of model synchronization, respectively. To address these challenges, we propose a new model synchronization method named Overlapped Synchronization Parallel (OSP), which achieves efficient communication with a 2-stage synchronization approach and uses Local-Gradient-based Parameter correction (LGP) to avoid accuracy loss caused by stale parameters. The prototype of OSP has been implemented using PyTorch and evaluated on commonly used deep learning models and datasets with a 9-node testbed. Evaluation results show that OSP can achieve up to 50\% improvement in throughput without accuracy loss compared to popular synchronization models.  ( 2 min )
    SaGess: Sampling Graph Denoising Diffusion Model for Scalable Graph Generation. (arXiv:2306.16827v1 [cs.LG])
    Over recent years, denoising diffusion generative models have come to be considered as state-of-the-art methods for synthetic data generation, especially in the case of generating images. These approaches have also proved successful in other applications such as tabular and graph data generation. However, due to computational complexity, to this date, the application of these techniques to graph data has been restricted to small graphs, such as those used in molecular modeling. In this paper, we propose SaGess, a discrete denoising diffusion approach, which is able to generate large real-world networks by augmenting a diffusion model (DiGress) with a generalized divide-and-conquer framework. The algorithm is capable of generating larger graphs by sampling a covering of subgraphs of the initial graph in order to train DiGress. SaGess then constructs a synthetic graph using the subgraphs that have been generated by DiGress. We evaluate the quality of the synthetic data sets against several competitor methods by comparing graph statistics between the original and synthetic samples, as well as evaluating the utility of the synthetic data set produced by using it to train a task-driven model, namely link prediction. In our experiments, SaGess, outperforms most of the one-shot state-of-the-art graph generating methods by a significant factor, both on the graph metrics and on the link prediction task.
    Numerical Data Imputation for Multimodal Data Sets: A Probabilistic Nearest-Neighbor Kernel Density Approach. (arXiv:2306.16906v1 [stat.ML])
    Numerical data imputation algorithms replace missing values by estimates to leverage incomplete data sets. Current imputation methods seek to minimize the error between the unobserved ground truth and the imputed values. But this strategy can create artifacts leading to poor imputation in the presence of multimodal or complex distributions. To tackle this problem, we introduce the $k$NN$\times$KDE algorithm: a data imputation method combining nearest neighbor estimation ($k$NN) and density estimation with Gaussian kernels (KDE). We compare our method with previous data imputation methods using artificial and real-world data with different data missing scenarios and various data missing rates, and show that our method can cope with complex original data structure, yields lower data imputation errors, and provides probabilistic estimates with higher likelihood than current methods. We release the code in open-source for the community: https://github.com/DeltaFloflo/knnxkde
    An Efficient Virtual Data Generation Method for Reducing Communication in Federated Learning. (arXiv:2306.12088v3 [cs.LG] UPDATED)
    Communication overhead is one of the major challenges in Federated Learning(FL). A few classical schemes assume the server can extract the auxiliary information about training data of the participants from the local models to construct a central dummy dataset. The server uses the dummy dataset to finetune aggregated global model to achieve the target test accuracy in fewer communication rounds. In this paper, we summarize the above solutions into a data-based communication-efficient FL framework. The key of the proposed framework is to design an efficient extraction module(EM) which ensures the dummy dataset has a positive effect on finetuning aggregated global model. Different from the existing methods that use generator to design EM, our proposed method, FedINIBoost borrows the idea of gradient match to construct EM. Specifically, FedINIBoost builds a proxy dataset of the real dataset in two steps for each participant at each communication round. Then the server aggregates all the proxy datasets to form a central dummy dataset, which is used to finetune aggregated global model. Extensive experiments verify the superiority of our method compared with the existing classical method, FedAVG, FedProx, Moon and FedFTG. Moreover, FedINIBoost plays a significant role in finetuning the performance of aggregated global model at the initial stage of FL.
    Group-based Robustness: A General Framework for Customized Robustness in the Real World. (arXiv:2306.16614v1 [cs.LG])
    Machine-learning models are known to be vulnerable to evasion attacks that perturb model inputs to induce misclassifications. In this work, we identify real-world scenarios where the true threat cannot be assessed accurately by existing attacks. Specifically, we find that conventional metrics measuring targeted and untargeted robustness do not appropriately reflect a model's ability to withstand attacks from one set of source classes to another set of target classes. To address the shortcomings of existing methods, we formally define a new metric, termed group-based robustness, that complements existing metrics and is better-suited for evaluating model performance in certain attack scenarios. We show empirically that group-based robustness allows us to distinguish between models' vulnerability against specific threat models in situations where traditional robustness metrics do not apply. Moreover, to measure group-based robustness efficiently and accurately, we 1) propose two loss functions and 2) identify three new attack strategies. We show empirically that with comparable success rates, finding evasive samples using our new loss functions saves computation by a factor as large as the number of targeted classes, and finding evasive samples using our new attack strategies saves time by up to 99\% compared to brute-force search methods. Finally, we propose a defense method that increases group-based robustness by up to 3.52$\times$.
    Sampling weights of deep neural networks. (arXiv:2306.16830v1 [cs.LG])
    We introduce a probability distribution, combined with an efficient sampling algorithm, for weights and biases of fully-connected neural networks. In a supervised learning context, no iterative optimization or gradient computations of internal network parameters are needed to obtain a trained network. The sampling is based on the idea of random feature models. However, instead of a data-agnostic distribution, e.g., a normal distribution, we use both the input and the output training data of the supervised learning problem to sample both shallow and deep networks. We prove that the sampled networks we construct are universal approximators. We also show that our sampling scheme is invariant to rigid body transformations and scaling of the input data. This implies many popular pre-processing techniques are no longer required. For Barron functions, we show that the $L^2$-approximation error of sampled shallow networks decreases with the square root of the number of neurons. In numerical experiments, we demonstrate that sampled networks achieve comparable accuracy as iteratively trained ones, but can be constructed orders of magnitude faster. Our test cases involve a classification benchmark from OpenML, sampling of neural operators to represent maps in function spaces, and transfer learning using well-known architectures.
    Game Level Blending using a Learned Level Representation. (arXiv:2306.16666v1 [cs.LG])
    Game level blending via machine learning, the process of combining features of game levels to create unique and novel game levels using Procedural Content Generation via Machine Learning (PCGML) techniques, has gained increasing popularity in recent years. However, many existing techniques rely on human-annotated level representations, which limits game level blending to a limited number of annotated games. Even with annotated games, researchers often need to author an additional shared representation to make blending possible. In this paper, we present a novel approach to game level blending that employs Clustering-based Tile Embeddings (CTE), a learned level representation technique that can serve as a level representation for unannotated games and a unified level representation across games without the need for human annotation. CTE represents game level tiles as a continuous vector representation, unifying their visual, contextual, and behavioral information. We apply this approach to two classic Nintendo games, Lode Runner and The Legend of Zelda. We run an evaluation comparing the CTE representation to a common, human-annotated representation in the blending task and find that CTE has comparable or better performance without the need for human annotation.
    ArrayBot: Reinforcement Learning for Generalizable Distributed Manipulation through Touch. (arXiv:2306.16857v1 [cs.RO])
    We present ArrayBot, a distributed manipulation system consisting of a $16 \times 16$ array of vertically sliding pillars integrated with tactile sensors, which can simultaneously support, perceive, and manipulate the tabletop objects. Towards generalizable distributed manipulation, we leverage reinforcement learning (RL) algorithms for the automatic discovery of control policies. In the face of the massively redundant actions, we propose to reshape the action space by considering the spatially local action patch and the low-frequency actions in the frequency domain. With this reshaped action space, we train RL agents that can relocate diverse objects through tactile observations only. Surprisingly, we find that the discovered policy can not only generalize to unseen object shapes in the simulator but also transfer to the physical robot without any domain randomization. Leveraging the deployed policy, we present abundant real-world manipulation tasks, illustrating the vast potential of RL on ArrayBot for distributed manipulation.
    Would I have gotten that reward? Long-term credit assignment by counterfactual contribution analysis. (arXiv:2306.16803v1 [cs.LG])
    To make reinforcement learning more sample efficient, we need better credit assignment methods that measure an action's influence on future rewards. Building upon Hindsight Credit Assignment (HCA), we introduce Counterfactual Contribution Analysis (COCOA), a new family of model-based credit assignment algorithms. Our algorithms achieve precise credit assignment by measuring the contribution of actions upon obtaining subsequent rewards, by quantifying a counterfactual query: "Would the agent still have reached this reward if it had taken another action?". We show that measuring contributions w.r.t. rewarding states, as is done in HCA, results in spurious estimates of contributions, causing HCA to degrade towards the high-variance REINFORCE estimator in many relevant environments. Instead, we measure contributions w.r.t. rewards or learned representations of the rewarding objects, resulting in gradient estimates with lower variance. We run experiments on a suite of problems specifically designed to evaluate long-term credit assignment capabilities. By using dynamic programming, we measure ground-truth policy gradients and show that the improved performance of our new model-based credit assignment methods is due to lower bias and variance compared to HCA and common baselines. Our results demonstrate how modeling action contributions towards rewarding outcomes can be leveraged for credit assignment, opening a new path towards sample-efficient reinforcement learning.
    Rapid-INR: Storage Efficient CPU-free DNN Training Using Implicit Neural Representation. (arXiv:2306.16699v1 [cs.CV])
    Implicit Neural Representation (INR) is an innovative approach for representing complex shapes or objects without explicitly defining their geometry or surface structure. Instead, INR represents objects as continuous functions. Previous research has demonstrated the effectiveness of using neural networks as INR for image compression, showcasing comparable performance to traditional methods such as JPEG. However, INR holds potential for various applications beyond image compression. This paper introduces Rapid-INR, a novel approach that utilizes INR for encoding and compressing images, thereby accelerating neural network training in computer vision tasks. Our methodology involves storing the whole dataset directly in INR format on a GPU, mitigating the significant data communication overhead between the CPU and GPU during training. Additionally, the decoding process from INR to RGB format is highly parallelized and executed on-the-fly. To further enhance compression, we propose iterative and dynamic pruning, as well as layer-wise quantization, building upon previous work. We evaluate our framework on the image classification task, utilizing the ResNet-18 backbone network and three commonly used datasets with varying image sizes. Rapid-INR reduces memory consumption to only 5% of the original dataset size and achieves a maximum 6$\times$ speedup over the PyTorch training pipeline, as well as a maximum 1.2x speedup over the DALI training pipeline, with only a marginal decrease in accuracy. Importantly, Rapid-INR can be readily applied to other computer vision tasks and backbone networks with reasonable engineering efforts. Our implementation code is publicly available at https://anonymous.4open.science/r/INR-4BF7.
    "That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks. (arXiv:2204.04636v2 [cs.AI] UPDATED)
    Adversarial attacks are a major challenge faced by current machine learning research. These purposely crafted inputs fool even the most advanced models, precluding their deployment in safety-critical applications. Extensive research in computer vision has been carried to develop reliable defense strategies. However, the same issue remains less explored in natural language processing. Our work presents a model-agnostic detector of adversarial text examples. The approach identifies patterns in the logits of the target classifier when perturbing the input text. The proposed detector improves the current state-of-the-art performance in recognizing adversarial inputs and exhibits strong generalization capabilities across different NLP models, datasets, and word-level attacks.
    CDMA: A Practical Cross-Device Federated Learning Algorithm for General Minimax Problems. (arXiv:2105.14216v4 [cs.LG] UPDATED)
    Minimax problems arise in a wide range of important applications including robust adversarial learning and Generative Adversarial Network (GAN) training. Recently, algorithms for minimax problems in the Federated Learning (FL) paradigm have received considerable interest. Existing federated algorithms for general minimax problems require the full aggregation (i.e., aggregation of local model information from all clients) in each training round. Thus, they are inapplicable to an important setting of FL known as the cross-device setting, which involves numerous unreliable mobile/IoT devices. In this paper, we develop the first practical algorithm named CDMA for general minimax problems in the cross-device FL setting. CDMA is based on a Start-Immediately-With-Enough-Responses mechanism, in which the server first signals a subset of clients to perform local computation and then starts to aggregate the local results reported by clients once it receives responses from enough clients in each round. With this mechanism, CDMA is resilient to the low client availability. In addition, CDMA is incorporated with a lightweight global correction in the local update steps of clients, which mitigates the impact of slow network connections. We establish theoretical guarantees of CDMA under different choices of hyperparameters and conduct experiments on AUC maximization, robust adversarial network training, and GAN training tasks. Theoretical and experimental results demonstrate the efficiency of CDMA.
    End-to-end Reinforcement Learning for Online Coverage Path Planning in Unknown Environments. (arXiv:2306.16978v1 [cs.RO])
    Coverage path planning is the problem of finding the shortest path that covers the entire free space of a given confined area, with applications ranging from robotic lawn mowing and vacuum cleaning, to demining and search-and-rescue tasks. While offline methods can find provably complete, and in some cases optimal, paths for known environments, their value is limited in online scenarios where the environment is not known beforehand, especially in the presence of non-static obstacles. We propose an end-to-end reinforcement learning-based approach in continuous state and action space, for the online coverage path planning problem that can handle unknown environments. We construct the observation space from both global maps and local sensory inputs, allowing the agent to plan a long-term path, and simultaneously act on short-term obstacle detections. To account for large-scale environments, we propose to use a multi-scale map input representation. Furthermore, we propose a novel total variation reward term for eliminating thin strips of uncovered space in the learned path. To validate the effectiveness of our approach, we perform extensive experiments in simulation with a distance sensor, surpassing the performance of a recent reinforcement learning-based approach.
    Did AI get more negative recently?. (arXiv:2202.13610v3 [cs.CL] UPDATED)
    In this paper, we classify scientific articles in the domain of natural language processing (NLP) and machine learning (ML), as core subfields of artificial intelligence (AI), into whether (i) they extend the current state-of-the-art by the introduction of novel techniques which beat existing models or whether (ii) they mainly criticize the existing state-of-the-art, i.e. that it is deficient with respect to some property (e.g. wrong evaluation, wrong datasets, misleading task specification). We refer to contributions under (i) as having a 'positive stance' and contributions under (ii) as having a 'negative stance' (to related work). We annotate over 1.5 k papers from NLP and ML to train a SciBERT-based model to automatically predict the stance of a paper based on its title and abstract. We then analyse large-scale trends on over 41 k papers from the last approximately 35 years in NLP and ML, finding that papers have become substantially more positive over time, but negative papers also got more negative and we observe considerably more negative papers in recent years. Negative papers are also more influential in terms of citations they receive.
    On the Relationship Between RNN Hidden State Vectors and Semantic Ground Truth. (arXiv:2306.16854v1 [cs.LG])
    We examine the assumption that the hidden-state vectors of recurrent neural networks (RNNs) tend to form clusters of semantically similar vectors, which we dub the clustering hypothesis. While this hypothesis has been assumed in the analysis of RNNs in recent years, its validity has not been studied thoroughly on modern neural network architectures. We examine the clustering hypothesis in the context of RNNs that were trained to recognize regular languages. This enables us to draw on perfect ground-truth automata in our evaluation, against which we can compare the RNN's accuracy and the distribution of the hidden-state vectors. We start with examining the (piecewise linear) separability of an RNN's hidden-state vectors into semantically different classes. We continue the analysis by computing clusters over the hidden-state vector space with multiple state-of-the-art unsupervised clustering approaches. We formally analyze the accuracy of computed clustering functions and the validity of the clustering hypothesis by determining whether clusters group semantically similar vectors to the same state in the ground-truth model. Our evaluation supports the validity of the clustering hypothesis in the majority of examined cases. We observed that the hidden-state vectors of well-trained RNNs are separable, and that the unsupervised clustering techniques succeed in finding clusters of similar state vectors.
    Are Neurons Actually Collapsed? On the Fine-Grained Structure in Neural Representations. (arXiv:2306.17105v1 [cs.LG])
    Recent work has observed an intriguing ''Neural Collapse'' phenomenon in well-trained neural networks, where the last-layer representations of training samples with the same label collapse into each other. This appears to suggest that the last-layer representations are completely determined by the labels, and do not depend on the intrinsic structure of input distribution. We provide evidence that this is not a complete description, and that the apparent collapse hides important fine-grained structure in the representations. Specifically, even when representations apparently collapse, the small amount of remaining variation can still faithfully and accurately captures the intrinsic structure of input distribution. As an example, if we train on CIFAR-10 using only 5 coarse-grained labels (by combining two classes into one super-class) until convergence, we can reconstruct the original 10-class labels from the learned representations via unsupervised clustering. The reconstructed labels achieve $93\%$ accuracy on the CIFAR-10 test set, nearly matching the normal CIFAR-10 accuracy for the same architecture. We also provide an initial theoretical result showing the fine-grained representation structure in a simplified synthetic setting. Our results show concretely how the structure of input data can play a significant role in determining the fine-grained structure of neural representations, going beyond what Neural Collapse predicts.  ( 2 min )
    The Importance of Robust Features in Mitigating Catastrophic Forgetting. (arXiv:2306.17091v1 [cs.CV])
    Continual learning (CL) is an approach to address catastrophic forgetting, which refers to forgetting previously learned knowledge by neural networks when trained on new tasks or data distributions. The adversarial robustness has decomposed features into robust and non-robust types and demonstrated that models trained on robust features significantly enhance adversarial robustness. However, no study has been conducted on the efficacy of robust features from the lens of the CL model in mitigating catastrophic forgetting in CL. In this paper, we introduce the CL robust dataset and train four baseline models on both the standard and CL robust datasets. Our results demonstrate that the CL models trained on the CL robust dataset experienced less catastrophic forgetting of the previously learned tasks than when trained on the standard dataset. Our observations highlight the significance of the features provided to the underlying CL models, showing that CL robust features can alleviate catastrophic forgetting.
    Obeying the Order: Introducing Ordered Transfer Hyperparameter Optimisation. (arXiv:2306.16916v1 [cs.LG])
    We introduce ordered transfer hyperparameter optimisation (OTHPO), a version of transfer learning for hyperparameter optimisation (HPO) where the tasks follow a sequential order. Unlike for state-of-the-art transfer HPO, the assumption is that each task is most correlated to those immediately before it. This matches many deployed settings, where hyperparameters are retuned as more data is collected; for instance tuning a sequence of movie recommendation systems as more movies and ratings are added. We propose a formal definition, outline the differences to related problems and propose a basic OTHPO method that outperforms state-of-the-art transfer HPO. We empirically show the importance of taking order into account using ten benchmarks. The benchmarks are in the setting of gradually accumulating data, and span XGBoost, random forest, approximate k-nearest neighbor, elastic net, support vector machines and a separate real-world motivated optimisation problem. We open source the benchmarks to foster future research on ordered transfer HPO.  ( 2 min )
    Principles and Guidelines for Evaluating Social Robot Navigation Algorithms. (arXiv:2306.16740v1 [cs.RO])
    A major challenge to deploying robots widely is navigation in human-populated environments, commonly referred to as social robot navigation. While the field of social navigation has advanced tremendously in recent years, the fair evaluation of algorithms that tackle social navigation remains hard because it involves not just robotic agents moving in static environments but also dynamic human agents and their perceptions of the appropriateness of robot behavior. In contrast, clear, repeatable, and accessible benchmarks have accelerated progress in fields like computer vision, natural language processing and traditional robot navigation by enabling researchers to fairly compare algorithms, revealing limitations of existing solutions and illuminating promising new directions. We believe the same approach can benefit social navigation. In this paper, we pave the road towards common, widely accessible, and repeatable benchmarking criteria to evaluate social robot navigation. Our contributions include (a) a definition of a socially navigating robot as one that respects the principles of safety, comfort, legibility, politeness, social competency, agent understanding, proactivity, and responsiveness to context, (b) guidelines for the use of metrics, development of scenarios, benchmarks, datasets, and simulators to evaluate social navigation, and (c) a design of a social navigation metrics framework to make it easier to compare results from different simulators, robots and datasets.
    Understanding the Overfitting of the Episodic Meta-training. (arXiv:2306.16873v1 [cs.LG])
    Despite the success of two-stage few-shot classification methods, in the episodic meta-training stage, the model suffers severe overfitting. We hypothesize that it is caused by over-discrimination, i.e., the model learns to over-rely on the superficial features that fit for base class discrimination while suppressing the novel class generalization. To penalize over-discrimination, we introduce knowledge distillation techniques to keep novel generalization knowledge from the teacher model during training. Specifically, we select the teacher model as the one with the best validation accuracy during meta-training and restrict the symmetric Kullback-Leibler (SKL) divergence between the output distribution of the linear classifier of the teacher model and that of the student model. This simple approach outperforms the standard meta-training process. We further propose the Nearest Neighbor Symmetric Kullback-Leibler (NNSKL) divergence for meta-training to push the limits of knowledge distillation techniques. NNSKL takes few-shot tasks as input and penalizes the output of the nearest neighbor classifier, which possesses an impact on the relationships between query embedding and support centers. By combining SKL and NNSKL in meta-training, the model achieves even better performance and surpasses state-of-the-art results on several benchmarks.
    Solving Kernel Ridge Regression with Gradient-Based Optimization Methods. (arXiv:2306.16838v1 [stat.ML])
    Kernel ridge regression, KRR, is a non-linear generalization of linear ridge regression. Here, we introduce an equivalent formulation of the objective function of KRR, opening up both for using other penalties than the ridge penalty and for studying kernel ridge regression from the perspective of gradient descent. Using a continuous-time perspective, we derive a closed-form solution, kernel gradient flow, KGF, with regularization through early stopping, which allows us to theoretically bound the differences between KGF and KRR. We generalize KRR by replacing the ridge penalty with the $\ell_1$ and $\ell_\infty$ penalties and utilize the fact that analogously to the similarities between KGF and KRR, the solutions obtained when using these penalties are very similar to those obtained from forward stagewise regression (also known as coordinate descent) and sign gradient descent in combination with early stopping. Thus the need for computationally heavy proximal gradient descent algorithms can be alleviated. We show theoretically and empirically how these penalties, and corresponding gradient-based optimization algorithms, produce signal-driven and robust regression solutions, respectively. We also investigate kernel gradient descent where the kernel is allowed to change during training, and theoretically address the effects this has on generalization. Based on our findings, we propose an update scheme for the bandwidth of translational-invariant kernels, where we let the bandwidth decrease to zero during training, thus circumventing the need for hyper-parameter selection. We demonstrate on real and synthetic data how decreasing the bandwidth during training outperforms using a constant bandwidth, selected by cross-validation and marginal likelihood maximization. We also show that using a decreasing bandwidth, we are able to achieve both zero training error and a double descent behavior.
    MNISQ: A Large-Scale Quantum Circuit Dataset for Machine Learning on/for Quantum Computers in the NISQ era. (arXiv:2306.16627v1 [quant-ph])
    We introduce the first large-scale dataset, MNISQ, for both the Quantum and the Classical Machine Learning community during the Noisy Intermediate-Scale Quantum era. MNISQ consists of 4,950,000 data points organized in 9 subdatasets. Building our dataset from the quantum encoding of classical information (e.g., MNIST dataset), we deliver a dataset in a dual form: in quantum form, as circuits, and in classical form, as quantum circuit descriptions (quantum programming language, QASM). In fact, also the Machine Learning research related to quantum computers undertakes a dual challenge: enhancing machine learning exploiting the power of quantum computers, while also leveraging state-of-the-art classical machine learning methodologies to help the advancement of quantum computing. Therefore, we perform circuit classification on our dataset, tackling the task with both quantum and classical models. In the quantum endeavor, we test our circuit dataset with Quantum Kernel methods, and we show excellent results up to $97\%$ accuracy. In the classical world, the underlying quantum mechanical structures within the quantum circuit data are not trivial. Nevertheless, we test our dataset on three classical models: Structured State Space sequence model (S4), Transformer and LSTM. In particular, the S4 model applied on the tokenized QASM sequences reaches an impressive $77\%$ accuracy. These findings illustrate that quantum circuit-related datasets are likely to be quantum advantageous, but also that state-of-the-art machine learning methodologies can competently classify and recognize quantum circuits. We finally entrust the quantum and classical machine learning community the fundamental challenge to build more quantum-classical datasets like ours and to build future benchmarks from our experiments. The dataset is accessible on GitHub and its circuits are easily run in qulacs or qiskit.
    Assessing the Performance of 1D-Convolution Neural Networks to Predict Concentration of Mixture Components from Raman Spectra. (arXiv:2306.16621v1 [eess.SP])
    An emerging application of Raman spectroscopy is monitoring the state of chemical reactors during biologic drug production. Raman shift intensities scale linearly with the concentrations of chemical species and thus can be used to analytically determine real-time concentrations using non-destructive light irradiation in a label-free manner. Chemometric algorithms are used to interpret Raman spectra produced from complex mixtures of bioreactor contents as a reaction evolves. Finding the optimal algorithm for a specific bioreactor environment is challenging due to the lack of freely available Raman mixture datasets. The RaMix Python package addresses this challenge by enabling the generation of synthetic Raman mixture datasets with controllable noise levels to assess the utility of different chemometric algorithm types for real-time monitoring applications. To demonstrate the capabilities of this package and compare the performance of different chemometric algorithms, 48 datasets of simulated spectra were generated using the RaMix Python package. The four tested algorithms include partial least squares regression (PLS), a simple neural network, a simple convolutional neural network (simple CNN), and a 1D convolutional neural network with a ResNet architecture (ResNet). The performance of the PLS and simple CNN model was found to be comparable, with the PLS algorithm slightly outperforming the other models on 83\% of the data sets. The simple CNN model outperforms the other models on large, high noise datasets, demonstrating the superior capability of convolutional neural networks compared to PLS in analyzing noisy spectra. These results demonstrate the promise of CNNs to automatically extract concentration information from unprocessed, noisy spectra, allowing for better process control of industrial drug production. Code for this project is available at github.com/DexterAntonio/RaMix.
    Moreau Envelope Based Difference-of-weakly-Convex Reformulation and Algorithm for Bilevel Programs. (arXiv:2306.16761v1 [math.OC])
    Recently, Ye et al. (Mathematical Programming 2023) designed an algorithm for solving a specific class of bilevel programs with an emphasis on applications related to hyperparameter selection, utilizing the difference of convex algorithm based on the value function approach reformulation. The proposed algorithm is particularly powerful when the lower level problem is fully convex , such as a support vector machine model or a least absolute shrinkage and selection operator model. In this paper, to suit more applications related to machine learning and statistics, we substantially weaken the underlying assumption from lower level full convexity to weak convexity. Accordingly, we propose a new reformulation using Moreau envelope of the lower level problem and demonstrate that this reformulation is a difference of weakly convex program. Subsequently, we develop a sequentially convergent algorithm for solving this difference of weakly convex program. To evaluate the effectiveness of our approach, we conduct numerical experiments on the bilevel hyperparameter selection problem from elastic net, sparse group lasso, and RBF kernel support vector machine models.
    Elastically-Constrained Meta-Learner for Federated Learning. (arXiv:2306.16703v1 [cs.LG])
    Federated learning is an approach to collaboratively training machine learning models for multiple parties that prohibit data sharing. One of the challenges in federated learning is non-IID data between clients, as a single model can not fit the data distribution for all clients. Meta-learning, such as Per-FedAvg, is introduced to cope with the challenge. Meta-learning learns shared initial parameters for all clients. Each client employs gradient descent to adapt the initialization to local data distributions quickly to realize model personalization. However, due to non-convex loss function and randomness of sampling update, meta-learning approaches have unstable goals in local adaptation for the same client. This fluctuation in different adaptation directions hinders the convergence in meta-learning. To overcome this challenge, we use the historical local adapted model to restrict the direction of the inner loop and propose an elastic-constrained method. As a result, the current round inner loop keeps historical goals and adapts to better solutions. Experiments show our method boosts meta-learning convergence and improves personalization without additional calculation and communication. Our method achieved SOTA on all metrics in three public datasets.
    GuidedMixup: An Efficient Mixup Strategy Guided by Saliency Maps. (arXiv:2306.16612v1 [cs.CV])
    Data augmentation is now an essential part of the image training process, as it effectively prevents overfitting and makes the model more robust against noisy datasets. Recent mixing augmentation strategies have advanced to generate the mixup mask that can enrich the saliency information, which is a supervisory signal. However, these methods incur a significant computational burden to optimize the mixup mask. From this motivation, we propose a novel saliency-aware mixup method, GuidedMixup, which aims to retain the salient regions in mixup images with low computational overhead. We develop an efficient pairing algorithm that pursues to minimize the conflict of salient regions of paired images and achieve rich saliency in mixup images. Moreover, GuidedMixup controls the mixup ratio for each pixel to better preserve the salient region by interpolating two paired images smoothly. The experiments on several datasets demonstrate that GuidedMixup provides a good trade-off between augmentation overhead and generalization performance on classification datasets. In addition, our method shows good performance in experiments with corrupted or reduced datasets.
    Curvature-Independent Last-Iterate Convergence for Games on Riemannian Manifolds. (arXiv:2306.16617v1 [math.OC])
    Numerous applications in machine learning and data analytics can be formulated as equilibrium computation over Riemannian manifolds. Despite the extensive investigation of their Euclidean counterparts, the performance of Riemannian gradient-based algorithms remain opaque and poorly understood. We revisit the original scheme of Riemannian gradient descent (RGD) and analyze it under a geodesic monotonicity assumption, which includes the well-studied geodesically convex-concave min-max optimization problem as a special case. Our main contribution is to show that, despite the phenomenon of distance distortion, the RGD scheme, with a step size that is agnostic to the manifold's curvature, achieves a curvature-independent and linear last-iterate convergence rate in the geodesically strongly monotone setting. To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence in the Riemannian setting has not been considered before.
    Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning. (arXiv:2306.14522v2 [math.OC] UPDATED)
    The widely used stochastic gradient methods for minimizing nonconvex composite objective functions require the Lipschitz smoothness of the differentiable part. But the requirement does not hold true for problem classes including quadratic inverse problems and training neural networks. To address this issue, we investigate a family of stochastic Bregman proximal gradient (SBPG) methods, which only require smooth adaptivity of the differentiable part. SBPG replaces the upper quadratic approximation used in SGD with the Bregman proximity measure, resulting in a better approximation model that captures the non-Lipschitz gradients of the nonconvex objective. We formulate the vanilla SBPG and establish its convergence properties under nonconvex setting without finite-sum structure. Experimental results on quadratic inverse problems testify the robustness of SBPG. Moreover, we propose a momentum-based version of SBPG (MSBPG) and prove it has improved convergence properties. We apply MSBPG to the training of deep neural networks with a polynomial kernel function, which ensures the smooth adaptivity of the loss function. Experimental results on representative benchmarks demonstrate the effectiveness and robustness of MSBPG in training neural networks. Since the additional computation cost of MSBPG compared with SGD is negligible in large-scale optimization, MSBPG can potentially be employed as an universal open-source optimizer in the future.  ( 2 min )
    ViP: A Differentially Private Foundation Model for Computer Vision. (arXiv:2306.08842v2 [cs.CV] UPDATED)
    Artificial intelligence (AI) has seen a tremendous surge in capabilities thanks to the use of foundation models trained on internet-scale data. On the flip side, the uncurated nature of internet-scale data also poses significant privacy and legal risks, as they often contain personal information or copyrighted material that should not be trained on without permission. In this work, we propose as a mitigation measure a recipe to train foundation vision models with differential privacy (DP) guarantee. We identify masked autoencoders as a suitable learning algorithm that aligns well with DP-SGD, and train ViP -- a Vision transformer with differential Privacy -- under a strict privacy budget of $\epsilon=8$ on the LAION400M dataset. We evaluate the quality of representation learned by ViP using standard downstream vision tasks; in particular, ViP achieves a (non-private) linear probing accuracy of $55.7\%$ on ImageNet, comparable to that of end-to-end trained AlexNet (trained and evaluated on ImageNet). Our result suggests that scaling to internet-scale data can be practical for private learning. Code is available at \url{https://github.com/facebookresearch/ViP-MAE}.  ( 2 min )
    An Empirical Evaluation of the Rashomon Effect in Explainable Machine Learning. (arXiv:2306.15786v2 [cs.LG] UPDATED)
    The Rashomon Effect describes the following phenomenon: for a given dataset there may exist many models with equally good performance but with different solution strategies. The Rashomon Effect has implications for Explainable Machine Learning, especially for the comparability of explanations. We provide a unified view on three different comparison scenarios and conduct a quantitative evaluation across different datasets, models, attribution methods, and metrics. We find that hyperparameter-tuning plays a role and that metric selection matters. Our results provide empirical support for previously anecdotal evidence and exhibit challenges for both scientists and practitioners.  ( 2 min )
    Understanding Graph Neural Networks with Generalized Geometric Scattering Transforms. (arXiv:1911.06253v5 [stat.ML] UPDATED)
    The scattering transform is a multilayered wavelet-based deep learning architecture that acts as a model of convolutional neural networks. Recently, several works have introduced generalizations of the scattering transform for non-Euclidean settings such as graphs. Our work builds upon these constructions by introducing windowed and non-windowed geometric scattering transforms for graphs based upon a very general class of asymmetric wavelets. We show that these asymmetric graph scattering transforms have many of the same theoretical guarantees as their symmetric counterparts. As a result, the proposed construction unifies and extends known theoretical results for many of the existing graph scattering architectures. In doing so, this work helps bridge the gap between geometric scattering and other graph neural networks by introducing a large family of networks with provable stability and invariance guarantees. These results lay the groundwork for future deep learning architectures for graph-structured data that have learned filters and also provably have desirable theoretical properties.  ( 2 min )
    SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient. (arXiv:2301.11913v2 [cs.DC] UPDATED)
    Many deep learning applications benefit from using large models with billions of parameters. Training these models is notoriously expensive due to the need for specialized HPC clusters. In this work, we consider alternative setups for training large models: using cheap "preemptible" instances or pooling existing resources from multiple regions. We analyze the performance of existing model-parallel algorithms in these conditions and find configurations where training larger models becomes less communication-intensive. Based on these findings, we propose SWARM parallelism, a model-parallel training algorithm designed for poorly connected, heterogeneous and unreliable devices. SWARM creates temporary randomized pipelines between nodes that are rebalanced in case of failure. We empirically validate our findings and compare SWARM parallelism with existing large-scale training approaches. Finally, we combine our insights with compression strategies to train a large Transformer language model with 1B shared parameters (approximately 13B before sharing) on preemptible T4 GPUs with less than 200Mb/s network.  ( 2 min )
    Concept-Oriented Deep Learning with Large Language Models. (arXiv:2306.17089v1 [cs.LG])
    Large Language Models (LLMs) have been successfully used in many natural-language tasks and applications including text generation and AI chatbots. They also are a promising new technology for concept-oriented deep learning (CODL). However, the prerequisite is that LLMs understand concepts and ensure conceptual consistency. We discuss these in this paper, as well as major uses of LLMs for CODL including concept extraction from text, concept graph extraction from text, and concept learning. Human knowledge consists of both symbolic (conceptual) knowledge and embodied (sensory) knowledge. Text-only LLMs, however, can represent only symbolic (conceptual) knowledge. Multimodal LLMs, on the other hand, are capable of representing the full range (conceptual and sensory) of human knowledge. We discuss conceptual understanding in visual-language LLMs, the most important multimodal LLMs, and major uses of them for CODL including concept extraction from image, concept graph extraction from image, and concept learning. While uses of LLMs for CODL are valuable standalone, they are particularly valuable as part of LLM applications such as AI chatbots.  ( 2 min )
    String Diagrams with Factorized Densities. (arXiv:2305.02506v2 [cs.PL] UPDATED)
    A growing body of research on probabilistic programs and causal models has highlighted the need to reason compositionally about model classes that extend directed graphical models. Both probabilistic programs and causal models define a joint probability density over a set of random variables, and exhibit sparse structure that can be used to reason about causation and conditional independence. This work builds on recent work on Markov categories of probabilistic mappings to define a category whose morphisms combine a joint density, factorized over each sample space, with a deterministic mapping from samples to return values. This is a step towards closing the gap between recent category-theoretic descriptions of probability measures, and the operational definitions of factorized densities that are commonly employed in probabilistic programming and causal inference.  ( 2 min )
    Secure and Fast Asynchronous Vertical Federated Learning via Cascaded Hybrid Optimization. (arXiv:2306.16077v2 [cs.LG] UPDATED)
    Vertical Federated Learning (VFL) attracts increasing attention because it empowers multiple parties to jointly train a privacy-preserving model over vertically partitioned data. Recent research has shown that applying zeroth-order optimization (ZOO) has many advantages in building a practical VFL algorithm. However, a vital problem with the ZOO-based VFL is its slow convergence rate, which limits its application in handling modern large models. To address this problem, we propose a cascaded hybrid optimization method in VFL. In this method, the downstream models (clients) are trained with ZOO to protect privacy and ensure that no internal information is shared. Meanwhile, the upstream model (server) is updated with first-order optimization (FOO) locally, which significantly improves the convergence rate, making it feasible to train the large models without compromising privacy and security. We theoretically prove that our VFL framework converges faster than the ZOO-based VFL, as the convergence of our framework is not limited by the size of the server model, making it effective for training large models with the major part on the server. Extensive experiments demonstrate that our method achieves faster convergence than the ZOO-based VFL framework, while maintaining an equivalent level of privacy protection. Moreover, we show that the convergence of our VFL is comparable to the unsafe FOO-based VFL baseline. Additionally, we demonstrate that our method makes the training of a large model feasible.  ( 3 min )
    Algorithmic Censoring in Dynamic Learning Systems. (arXiv:2305.09035v2 [cs.LG] UPDATED)
    Dynamic learning systems subject to selective labeling exhibit censoring, i.e. persistent negative predictions assigned to one or more subgroups of points. In applications like consumer finance, this results in groups of applicants that are persistently denied and thus never enter into the training data. In this work, we formalize censoring, demonstrate how it can arise, and highlight difficulties in detection. We consider safeguards against censoring - recourse and randomized-exploration - both of which ensure we collect labels for points that would otherwise go unobserved. The resulting techniques allow examples from censored groups to enter into the training data and correct the model. Our results highlight the otherwise unmeasured harms of censoring and demonstrate the effectiveness of mitigation strategies across a range of data generating processes.  ( 2 min )
    Magnitude Invariant Parametrizations Improve Hypernetwork Learning. (arXiv:2304.07645v2 [cs.LG] UPDATED)
    Hypernetworks, neural networks that predict the parameters of another neural network, are powerful models that have been successfully used in diverse applications from image generation to multi-task learning. Unfortunately, existing hypernetworks are often challenging to train. Training typically converges far more slowly than for non-hypernetwork models, and the rate of convergence can be very sensitive to hyperparameter choices. In this work, we identify a fundamental and previously unidentified problem that contributes to the challenge of training hypernetworks: a magnitude proportionality between the inputs and outputs of the hypernetwork. We demonstrate both analytically and empirically that this can lead to unstable optimization, thereby slowing down convergence, and sometimes even preventing any learning. We present a simple solution to this problem using a revised hypernetwork formulation that we call Magnitude Invariant Parametrizations (MIP). We demonstrate the proposed solution on several hypernetwork tasks, where it consistently stabilizes training and achieves faster convergence. Furthermore, we perform a comprehensive ablation study including choices of activation function, normalization strategies, input dimensionality, and hypernetwork architecture; and find that MIP improves training in all scenarios. We provide easy-to-use code that can turn existing networks into MIP-based hypernetworks.  ( 2 min )
    Scale-Space Hypernetworks for Efficient Biomedical Imaging. (arXiv:2304.05448v2 [cs.CV] UPDATED)
    Convolutional Neural Networks (CNNs) are the predominant model used for a variety of medical image analysis tasks. At inference time, these models are computationally intensive, especially with volumetric data. In principle, it is possible to trade accuracy for computational efficiency by manipulating the rescaling factor in the downsample and upsample layers of CNN architectures. However, properly exploring the accuracy-efficiency trade-off is prohibitively expensive with existing models. To address this, we introduce Scale-Space HyperNetworks (SSHN), a method that learns a spectrum of CNNs with varying internal rescaling factors. A single SSHN characterizes an entire Pareto accuracy-efficiency curve of models that match, and occasionally surpass, the outcomes of training many separate networks with fixed rescaling factors. We demonstrate the proposed approach in several medical image analysis applications, comparing SSHN against strategies with both fixed and dynamic rescaling factors. We find that SSHN consistently provides a better accuracy-efficiency trade-off at a fraction of the training cost. Trained SSHNs enable the user to quickly choose a rescaling factor that appropriately balances accuracy and computational efficiency for their particular needs at inference.  ( 2 min )
    Improving Identity-Robustness for Face Models. (arXiv:2304.03838v2 [cs.CV] UPDATED)
    Despite the success of deep-learning models in many tasks, there have been concerns about such models learning shortcuts, and their lack of robustness to irrelevant confounders. When it comes to models directly trained on human faces, a sensitive confounder is that of human identities. Many face-related tasks should ideally be identity-independent, and perform uniformly across different individuals (i.e. be fair). One way to measure and enforce such robustness and performance uniformity is through enforcing it during training, assuming identity-related information is available at scale. However, due to privacy concerns and also the cost of collecting such information, this is often not the case, and most face datasets simply contain input images and their corresponding task-related labels. Thus, improving identity-related robustness without the need for such annotations is of great importance. Here, we explore using face-recognition embedding vectors, as proxies for identities, to enforce such robustness. We propose to use the structure in the face-recognition embedding space, to implicitly emphasize rare samples within each class. We do so by weighting samples according to their conditional inverse density (CID) in the proxy embedding space. Our experiments suggest that such a simple sample weighting scheme, not only improves the training robustness, it often improves the overall performance as a result of such robustness. We also show that employing such constraints during training results in models that are significantly less sensitive to different levels of bias in the dataset.  ( 3 min )
    SVDinsTN: An Integrated Method for Tensor Network Representation with Efficient Structure Search. (arXiv:2305.14912v2 [cs.LG] UPDATED)
    Tensor network (TN) representation is a powerful technique for data analysis and machine learning. It practically involves a challenging TN structure search (TN-SS) problem, which aims to search for the optimal structure to achieve a compact representation. Existing TN-SS methods mainly adopt a bi-level optimization method that leads to excessive computational costs due to repeated structure evaluations. To address this issue, we propose an efficient integrated (single-level) method named SVD-inspired TN decomposition (SVDinsTN), eliminating the need for repeated tedious structure evaluation. By inserting a diagonal factor for each edge of the fully-connected TN, we calculate TN cores and diagonal factors simultaneously, with factor sparsity revealing the most compact TN structure. Experimental results on real-world data demonstrate that SVDinsTN achieves approximately $10\sim{}10^3$ times acceleration in runtime compared to the existing TN-SS methods while maintaining a comparable level of representation ability.  ( 2 min )
    Mastering Percolation-like Games with Deep Learning. (arXiv:2305.07687v2 [cs.LG] UPDATED)
    Though robustness of networks to random attacks has been widely studied, intentional destruction by an intelligent agent is not tractable with previous methods. Here we devise a single-player game on a lattice that mimics the logic of an attacker attempting to destroy a network. The objective of the game is to disable all nodes in the fewest number of steps. We develop a reinforcement learning approach using deep Q-learning that is capable of learning to play this game successfully, and in so doing, to optimally attack a network. Because the learning algorithm is universal, we train agents on different definitions of robustness and compare the learned strategies. We find that superficially similar definitions of robustness induce different strategies in the trained agent, implying that optimally attacking or defending a network is sensitive the particular objective. Our method provides a new approach to understand network robustness, with potential applications to other discrete processes in disordered systems.  ( 2 min )
    Learning Mixtures of Gaussians with Censored Data. (arXiv:2305.04127v2 [cs.LG] UPDATED)
    We study the problem of learning mixtures of Gaussians with censored data. Statistical learning with censored data is a classical problem, with numerous practical applications, however, finite-sample guarantees for even simple latent variable models such as Gaussian mixtures are missing. Formally, we are given censored data from a mixture of univariate Gaussians $$ \sum_{i=1}^k w_i \mathcal{N}(\mu_i,\sigma^2), $$ i.e. the sample is observed only if it lies inside a set $S$. The goal is to learn the weights $w_i$ and the means $\mu_i$. We propose an algorithm that takes only $\frac{1}{\varepsilon^{O(k)}}$ samples to estimate the weights $w_i$ and the means $\mu_i$ within $\varepsilon$ error.  ( 2 min )
    Variable Importance Matching for Causal Inference. (arXiv:2302.11715v2 [stat.ME] UPDATED)
    Our goal is to produce methods for observational causal inference that are auditable, easy to troubleshoot, accurate for treatment effect estimation, and scalable to high-dimensional data. We describe a general framework called Model-to-Match that achieves these goals by (i) learning a distance metric via outcome modeling, (ii) creating matched groups using the distance metric, and (iii) using the matched groups to estimate treatment effects. Model-to-Match uses variable importance measurements to construct a distance metric, making it a flexible framework that can be adapted to various applications. Concentrating on the scalability of the problem in the number of potential confounders, we operationalize the Model-to-Match framework with LASSO. We derive performance guarantees for settings where LASSO outcome modeling consistently identifies all confounders (importantly without requiring the linear model to be correctly specified). We also provide experimental results demonstrating the method's auditability, accuracy, and scalability as well as extensions to more general nonparametric outcome modeling.  ( 2 min )
    Sparsity exploitation via discovering graphical models in multi-variate time-series forecasting. (arXiv:2306.17090v1 [cs.LG])
    Graph neural networks (GNNs) have been widely applied in multi-variate time-series forecasting (MTSF) tasks because of their capability in capturing the correlations among different time-series. These graph-based learning approaches improve the forecasting performance by discovering and understanding the underlying graph structures, which represent the data correlation. When the explicit prior graph structures are not available, most existing works cannot guarantee the sparsity of the generated graphs that make the overall model computational expensive and less interpretable. In this work, we propose a decoupled training method, which includes a graph generating module and a GNNs forecasting module. First, we use Graphical Lasso (or GraphLASSO) to directly exploit the sparsity pattern from data to build graph structures in both static and time-varying cases. Second, we fit these graph structures and the input data into a Graph Convolutional Recurrent Network (GCRN) to train a forecasting model. The experimental results on three real-world datasets show that our novel approach has competitive performance against existing state-of-the-art forecasting algorithms while providing sparse, meaningful and explainable graph structures and reducing training time by approximately 40%. Our PyTorch implementation is publicly available at https://github.com/HySonLab/GraphLASSO  ( 2 min )
    Meta-Calibration Regularized Neural Networks. (arXiv:2303.15057v2 [cs.LG] UPDATED)
    Miscalibration-the mismatch between predicted probability and the true correctness likelihood-has been frequently identified in modern deep neural networks. Recent work in the field aims to address this problem by training calibrated models directly by optimizing a proxy of the calibration error alongside the conventional objective. Recently, Meta-Calibration (MC) showed the effectiveness of using meta-learning for learning better calibrated models. In this work, we extend MC with two main components: (1) gamma network (gamma-net), a meta network to learn a sample-wise gamma at a continuous space for focal loss for optimizing backbone network; (2) smooth expected calibration error (SECE), a Gaussian-kernel based unbiased and differentiable ECE which aims to smoothly optimizing gamma-net. The proposed method regularizes neural network towards better calibration meanwhile retain predictive performance. Our experiments show that (a) learning sample-wise gamma at continuous space can effectively perform calibration; (b) SECE smoothly optimise gamma-net towards better robustness to binning schemes; (c) the combination of gamma-net and SECE achieve the best calibration performance across various calibration metrics and retain very competitive predictive performance as compared to multiple recently proposed methods on three datasets.  ( 2 min )
    Deep Learning for Energy Time-Series Analysis and Forecasting. (arXiv:2306.09129v2 [cs.LG] UPDATED)
    Energy time-series analysis describes the process of analyzing past energy observations and possibly external factors so as to predict the future. Different tasks are involved in the general field of energy time-series analysis and forecasting, with electric load demand forecasting, personalized energy consumption forecasting, as well as renewable energy generation forecasting being among the most common ones. Following the exceptional performance of Deep Learning (DL) in a broad area of vision tasks, DL models have successfully been utilized in time-series forecasting tasks. This paper aims to provide insight into various DL methods geared towards improving the performance in energy time-series forecasting tasks, with special emphasis in Greek Energy Market, and equip the reader with the necessary knowledge to apply these methods in practice.  ( 2 min )
    Improving Patient Pre-screening for Clinical Trials: Assisting Physicians with Large Language Models. (arXiv:2304.07396v2 [cs.LG] UPDATED)
    Physicians considering clinical trials for their patients are met with the laborious process of checking many text based eligibility criteria. Large Language Models (LLMs) have shown to perform well for clinical information extraction and clinical reasoning, including medical tests, but not yet in real-world scenarios. This paper investigates the use of InstructGPT to assist physicians in determining eligibility for clinical trials based on a patient's summarised medical profile. Using a prompting strategy combining one-shot, selection-inference and chain-of-thought techniques, we investigate the performance of LLMs on 10 synthetically created patient profiles. Performance is evaluated at four levels: ability to identify screenable eligibility criteria from a trial given a medical profile; ability to classify for each individual criterion whether the patient qualifies; the overall classification whether a patient is eligible for a clinical trial and the percentage of criteria to be screened by physician. We evaluated against 146 clinical trials and a total of 4,135 eligibility criteria. The LLM was able to correctly identify the screenability of 72% (2,994/4,135) of the criteria. Additionally, 72% (341/471) of the screenable criteria were evaluated correctly. The resulting trial level classification as eligible or ineligible resulted in a recall of 0.5. By leveraging LLMs with a physician-in-the-loop, a recall of 1.0 and precision of 0.71 on clinical trial level can be achieved while reducing the amount of criteria to be checked by an estimated 90%. LLMs can be used to assist physicians with pre-screening of patients for clinical trials. By forcing instruction-tuned LLMs to produce chain-of-thought responses, the reasoning can be made transparent to and the decision process becomes amenable by physicians, thereby making such a system feasible for use in real-world scenarios.  ( 3 min )
    Orbit Classification of asteroids using implementation of radial Basis Function on Support Vector Machines. (arXiv:2306.17138v1 [astro-ph.EP])
    This research paper focuses on the implementation of radial Basis Function (RBF) Support Vector Machines (SVM) for classifying asteroid orbits. Asteroids are important astronomical objects, and their orbits play a crucial role in understanding the dynamics of the solar system. The International Astronomical Union maintains data archives that provide a playground to experiment with various machine-learning techniques. In this study, we explore the application of RBF SVM algorithm to classify asteroids. The results show that the RBF SVM algorithm provides a good efficiency and accuracy to the dataset. We also analyze the impact of various parameters on the performance of the RBF SVM algorithm and present the optimal parameter settings. Our study highlights the importance of using machine learning techniques for classifying asteroid orbits and the effectiveness of the RBF SVM algorithm in this regard.  ( 2 min )
    Interpretable Water Level Forecaster with Spatiotemporal Causal Attention Mechanisms. (arXiv:2303.00515v6 [cs.LG] UPDATED)
    Forecasting the water level of the Han River is essential to control traffic and avoid natural disasters. The stream flow of the Han River is affected by various and intricately connected factors. Thus, a simple forecasting machine frequently fails to capture its serial pattern. On the other hand, a complex predictive model loses the interpretability of the model output. This work proposes a neural network model with a novel transformer exploiting a causal relationship based on prior knowledge. The transformer consists of spatiotemporal attention weight that describes the spatial and temporal causation with multilayer networks with masking. Our model has two distinguished advantages against the existing spatiotemporal forecasting models. First, the model allows the heterogeneous predictors for each site such that a flexible regression is applicable to the causal network. Next, the model is adapted to partially identified causal structures. As a result, we have relaxed the constraints of the applicable causal network through our model. In real data analysis, we use the Han River dataset from 2016 to 2021, compare the proposed model with deep learning models, and confirm that our model provides an interpretable and consistent model with prior knowledge, such as a seasonality arising from the tidal force. Furthermore, in prediction performance, our model is better than or competitive with the state-of-the-art models.  ( 3 min )
    An image segmentation algorithm based on multi-scale feature pyramid network. (arXiv:2305.10631v4 [eess.IV] UPDATED)
    Medical image segmentation is particularly critical as a prerequisite for relevant quantitative analysis in the treatment of clinical diseases. For example, in clinical cervical cancer radiotherapy, after acquiring subabdominal MRI images, a fast and accurate image segmentation of organs and tumors in MRI images can optimize the clinical radiotherapy process, whereas traditional approaches use manual annotation by specialist doctors, which is time-consuming and laborious, therefore, automatic organ segmentation of subabdominal MRI images is a valuable research topic.  ( 2 min )
    Log-linear Guardedness and its Implications. (arXiv:2210.10012v2 [cs.LG] UPDATED)
    Methods for erasing human-interpretable concepts from neural representations that assume linearity have been found to be tractable and useful. However, the impact of this removal on the behavior of downstream classifiers trained on the modified representations is not fully understood. In this work, we formally define the notion of log-linear guardedness as the inability of an adversary to predict the concept directly from the representation, and study its implications. We show that, in the binary case, under certain assumptions, a downstream log-linear model cannot recover the erased concept. However, we demonstrate that a multiclass log-linear model \emph{can} be constructed that indirectly recovers the concept in some cases, pointing to the inherent limitations of log-linear guardedness as a downstream bias mitigation technique. These findings shed light on the theoretical limitations of linear erasure methods and highlight the need for further research on the connections between intrinsic and extrinsic bias in neural models.  ( 2 min )
    Temporal Robustness against Data Poisoning. (arXiv:2302.03684v2 [cs.LG] UPDATED)
    Data poisoning considers cases when an adversary manipulates the behavior of machine learning algorithms through malicious training data. Existing threat models of data poisoning center around a single metric, the number of poisoned samples. In consequence, if attackers can poison more samples than expected with affordable overhead, as in many practical scenarios, they may be able to render existing defenses ineffective in a short time. To address this issue, we leverage timestamps denoting the birth dates of data, which are often available but neglected in the past. Benefiting from these timestamps, we propose a temporal threat model of data poisoning with two novel metrics, earliness and duration, which respectively measure how long an attack started in advance and how long an attack lasted. Using these metrics, we define the notions of temporal robustness against data poisoning, providing a meaningful sense of protection even with unbounded amounts of poisoned samples. We present a benchmark with an evaluation protocol simulating continuous data collection and periodic deployments of updated models, thus enabling empirical evaluation of temporal robustness. Lastly, we develop and also empirically verify a baseline defense, namely temporal aggregation, offering provable temporal robustness and highlighting the potential of our temporal threat model for data poisoning.  ( 2 min )
    Dynamic-Resolution Model Learning for Object Pile Manipulation. (arXiv:2306.16700v1 [cs.RO])
    Dynamics models learned from visual observations have shown to be effective in various robotic manipulation tasks. One of the key questions for learning such dynamics models is what scene representation to use. Prior works typically assume representation at a fixed dimension or resolution, which may be inefficient for simple tasks and ineffective for more complicated tasks. In this work, we investigate how to learn dynamic and adaptive representations at different levels of abstraction to achieve the optimal trade-off between efficiency and effectiveness. Specifically, we construct dynamic-resolution particle representations of the environment and learn a unified dynamics model using graph neural networks (GNNs) that allows continuous selection of the abstraction level. During test time, the agent can adaptively determine the optimal resolution at each model-predictive control (MPC) step. We evaluate our method in object pile manipulation, a task we commonly encounter in cooking, agriculture, manufacturing, and pharmaceutical applications. Through comprehensive evaluations both in the simulation and the real world, we show that our method achieves significantly better performance than state-of-the-art fixed-resolution baselines at the gathering, sorting, and redistribution of granular object piles made with various instances like coffee beans, almonds, corn, etc.  ( 2 min )
    Non-Convex Optimizations for Machine Learning with Theoretical Guarantee: Robust Matrix Completion and Neural Network Learning. (arXiv:2306.16557v1 [cs.LG])
    Despite the recent development in machine learning, most learning systems are still under the concept of "black box", where the performance cannot be understood and derived. With the rise of safety and privacy concerns in public, designing an explainable learning system has become a new trend in machine learning. In general, many machine learning problems are formulated as minimizing (or maximizing) some loss function. Since real data are most likely generated from non-linear models, the loss function is non-convex in general. Unlike the convex optimization problem, gradient descent algorithms will be trapped in spurious local minima in solving non-convex optimization. Therefore, it is challenging to provide explainable algorithms when studying non-convex optimization problems. In this thesis, two popular non-convex problems are studied: (1) low-rank matrix completion and (2) neural network learning.  ( 2 min )
    Towards Optimal Randomized Strategies in Adversarial Example Game. (arXiv:2306.16738v1 [cs.LG])
    The vulnerability of deep neural network models to adversarial example attacks is a practical challenge in many artificial intelligence applications. A recent line of work shows that the use of randomization in adversarial training is the key to find optimal strategies against adversarial example attacks. However, in a fully randomized setting where both the defender and the attacker can use randomized strategies, there are no efficient algorithm for finding such an optimal strategy. To fill the gap, we propose the first algorithm of its kind, called FRAT, which models the problem with a new infinite-dimensional continuous-time flow on probability distribution spaces. FRAT maintains a lightweight mixture of models for the defender, with flexibility to efficiently update mixing weights and model parameters at each iteration. Furthermore, FRAT utilizes lightweight sampling subroutines to construct a random strategy for the attacker. We prove that the continuous-time limit of FRAT converges to a mixed Nash equilibria in a zero-sum game formed by a defender and an attacker. Experimental results also demonstrate the efficiency of FRAT on CIFAR-10 and CIFAR-100 datasets.  ( 2 min )
    Performance Analysis of DNN Inference/Training with Convolution and non-Convolution Operations. (arXiv:2306.16767v1 [cs.AR])
    Today's performance analysis frameworks for deep learning accelerators suffer from two significant limitations. First, although modern convolutional neural network (CNNs) consist of many types of layers other than convolution, especially during training, these frameworks largely focus on convolution layers only. Second, these frameworks are generally targeted towards inference, and lack support for training operations. This work proposes a novel performance analysis framework, SimDIT, for general ASIC-based systolic hardware accelerator platforms. The modeling effort of SimDIT comprehensively covers convolution and non-convolution operations of both CNN inference and training on a highly parameterizable hardware substrate. SimDIT is integrated with a backend silicon implementation flow and provides detailed end-to-end performance statistics (i.e., data access cost, cycle counts, energy, and power) for executing CNN inference and training workloads. SimDIT-enabled performance analysis reveals that on a 64X64 processing array, non-convolution operations constitute 59.5% of total runtime for ResNet-50 training workload. In addition, by optimally distributing available off-chip DRAM bandwidth and on-chip SRAM resources, SimDIT achieves 18X performance improvement over a generic static resource allocation for ResNet-50 inference.  ( 2 min )
    NaturalInversion: Data-Free Image Synthesis Improving Real-World Consistency. (arXiv:2306.16661v1 [cs.CV])
    We introduce NaturalInversion, a novel model inversion-based method to synthesize images that agrees well with the original data distribution without using real data. In NaturalInversion, we propose: (1) a Feature Transfer Pyramid which uses enhanced image prior of the original data by combining the multi-scale feature maps extracted from the pre-trained classifier, (2) a one-to-one approach generative model where only one batch of images are synthesized by one generator to bring the non-linearity to optimization and to ease the overall optimizing process, (3) learnable Adaptive Channel Scaling parameters which are end-to-end trained to scale the output image channel to utilize the original image prior further. With our NaturalInversion, we synthesize images from classifiers trained on CIFAR-10/100 and show that our images are more consistent with original data distribution than prior works by visualization and additional analysis. Furthermore, our synthesized images outperform prior works on various applications such as knowledge distillation and pruning, demonstrating the effectiveness of our proposed method.  ( 2 min )
    Feature Selection: A perspective on inter-attribute cooperation. (arXiv:2306.16559v1 [cs.LG])
    High-dimensional datasets depict a challenge for learning tasks in data mining and machine learning. Feature selection is an effective technique in dealing with dimensionality reduction. It is often an essential data processing step prior to applying a learning algorithm. Over the decades, filter feature selection methods have evolved from simple univariate relevance ranking algorithms to more sophisticated relevance-redundancy trade-offs and to multivariate dependencies-based approaches in recent years. This tendency to capture multivariate dependence aims at obtaining unique information about the class from the intercooperation among features. This paper presents a comprehensive survey of the state-of-the-art work on filter feature selection methods assisted by feature intercooperation, and summarizes the contributions of different approaches found in the literature. Furthermore, current issues and challenges are introduced to identify promising future research and development.  ( 2 min )
    Improving Fairness in Deepfake Detection. (arXiv:2306.16635v1 [cs.CV])
    Despite the development of effective deepfake detection models in recent years, several recent studies have demonstrated that biases in the training data utilized to develop deepfake detection models can lead to unfair performance for demographic groups of different races and/or genders. Such can result in these groups being unfairly targeted or excluded from detection, allowing misclassified deepfakes to manipulate public opinion and erode trust in the model. While these studies have focused on identifying and evaluating the unfairness in deepfake detection, no methods have been developed to address the fairness issue of deepfake detection at the algorithm level. In this work, we make the first attempt to improve deepfake detection fairness by proposing novel loss functions to train fair deepfake detection models in ways that are agnostic or aware of demographic factors. Extensive experiments on four deepfake datasets and five deepfake detectors demonstrate the effectiveness and flexibility of our approach in improving the deepfake detection fairness.  ( 2 min )
    DNA-TEQ: An Adaptive Exponential Quantization of Tensors for DNN Inference. (arXiv:2306.16430v1 [cs.LG])
    Quantization is commonly used in Deep Neural Networks (DNNs) to reduce the storage and computational complexity by decreasing the arithmetical precision of activations and weights, a.k.a. tensors. Efficient hardware architectures employ linear quantization to enable the deployment of recent DNNs onto embedded systems and mobile devices. However, linear uniform quantization cannot usually reduce the numerical precision to less than 8 bits without sacrificing high performance in terms of model accuracy. The performance loss is due to the fact that tensors do not follow uniform distributions. In this paper, we show that a significant amount of tensors fit into an exponential distribution. Then, we propose DNA-TEQ to exponentially quantize DNN tensors with an adaptive scheme that achieves the best trade-off between numerical precision and accuracy loss. The experimental results show that DNA-TEQ provides a much lower quantization bit-width compared to previous proposals, resulting in an average compression ratio of 40% over the linear INT8 baseline, with negligible accuracy loss and without retraining the DNNs. Besides, DNA-TEQ leads the way in performing dot-product operations in the exponential domain, which saves 66% of energy consumption on average for a set of widely used DNNs.  ( 2 min )
    Finite-Sample Symmetric Mean Estimation with Fisher Information Rate. (arXiv:2306.16573v1 [math.ST])
    The mean of an unknown variance-$\sigma^2$ distribution $f$ can be estimated from $n$ samples with variance $\frac{\sigma^2}{n}$ and nearly corresponding subgaussian rate. When $f$ is known up to translation, this can be improved asymptotically to $\frac{1}{n\mathcal I}$, where $\mathcal I$ is the Fisher information of the distribution. Such an improvement is not possible for general unknown $f$, but [Stone, 1975] showed that this asymptotic convergence $\textit{is}$ possible if $f$ is $\textit{symmetric}$ about its mean. Stone's bound is asymptotic, however: the $n$ required for convergence depends in an unspecified way on the distribution $f$ and failure probability $\delta$. In this paper we give finite-sample guarantees for symmetric mean estimation in terms of Fisher information. For every $f, n, \delta$ with $n > \log \frac{1}{\delta}$, we get convergence close to a subgaussian with variance $\frac{1}{n \mathcal I_r}$, where $\mathcal I_r$ is the $r$-$\textit{smoothed}$ Fisher information with smoothing radius $r$ that decays polynomially in $n$. Such a bound essentially matches the finite-sample guarantees in the known-$f$ setting.  ( 2 min )
    HNO: Hyena Neural Operator for solving PDEs. (arXiv:2306.16524v1 [cs.LG])
    Numerically solving partial differential equations (PDEs) typically requires fine discretization to resolve necessary spatiotemporal scales, which can be computationally expensive. Recent advances in deep learning have provided a new approach to solving PDEs that involves the use of neural operators. Neural operators are neural network architectures that learn mappings between function spaces and have the capability to solve partial differential equations based on data. This study utilizes a novel neural operator called Hyena, which employs a long convolutional filter that is parameterized by a multilayer perceptron. The Hyena operator is an operation that enjoys sub-quadratic complexity and state space model to parameterize long convolution that enjoys global receptive field. This mechanism enhances the model's comprehension of the input's context and enables data-dependent weight for different PDE instances. To measure how effective the layers are in solving PDEs, we conduct experiments on Burger's equation and Navier Stokes equation. Our findings indicate Hyena Neural operator can serve as an efficient and accurate model for learning PDEs' solution operator. The data and code used can be found at: https://github.com/Saupatil07/Hyena-Neural-Operator  ( 2 min )
    A Collaborative Transfer Learning Framework for Cross-domain Recommendation. (arXiv:2306.16425v1 [cs.IR])
    In the recommendation systems, there are multiple business domains to meet the diverse interests and needs of users, and the click-through rate(CTR) of each domain can be quite different, which leads to the demand for CTR prediction modeling for different business domains. The industry solution is to use domain-specific models or transfer learning techniques for each domain. The disadvantage of the former is that the data from other domains is not utilized by a single domain model, while the latter leverage all the data from different domains, but the fine-tuned model of transfer learning may trap the model in a local optimum of the source domain, making it difficult to fit the target domain. Meanwhile, significant differences in data quantity and feature schemas between different domains, known as domain shift, may lead to negative transfer in the process of transferring. To overcome these challenges, we propose the Collaborative Cross-Domain Transfer Learning Framework (CCTL). CCTL evaluates the information gain of the source domain on the target domain using a symmetric companion network and adjusts the information transfer weight of each source domain sample using the information flow network. This approach enables full utilization of other domain data while avoiding negative migration. Additionally, a representation enhancement network is used as an auxiliary task to preserve domain-specific features. Comprehensive experiments on both public and real-world industrial datasets, CCTL achieved SOTA score on offline metrics. At the same time, the CCTL algorithm has been deployed in Meituan, bringing 4.37% CTR and 5.43% GMV lift, which is significant to the business.  ( 3 min )
    Increasing Performance And Sample Efficiency With Model-agnostic Interactive Feature Attributions. (arXiv:2306.16431v1 [cs.LG])
    Model-agnostic feature attributions can provide local insights in complex ML models. If the explanation is correct, a domain expert can validate and trust the model's decision. However, if it contradicts the expert's knowledge, related work only corrects irrelevant features to improve the model. To allow for unlimited interaction, in this paper we provide model-agnostic implementations for two popular explanation methods (Occlusion and Shapley values) to enforce entirely different attributions in the complex model. For a particular set of samples, we use the corrected feature attributions to generate extra local data, which is used to retrain the model to have the right explanation for the samples. Through simulated and real data experiments on a variety of models we show how our proposed approach can significantly improve the model's performance only by augmenting its training dataset based on corrected explanations. Adding our interactive explanations to active learning settings increases the sample efficiency significantly and outperforms existing explanatory interactive strategies. Additionally we explore how a domain expert can provide feature attributions which are sufficiently correct to improve the model.  ( 2 min )
    CMATH: Can Your Language Model Pass Chinese Elementary School Math Test?. (arXiv:2306.16636v1 [cs.CL])
    We present the Chinese Elementary School Math Word Problems (CMATH) dataset, comprising 1.7k elementary school-level math word problems with detailed annotations, source from actual Chinese workbooks and exams. This dataset aims to provide a benchmark tool for assessing the following question: to what grade level of elementary school math do the abilities of popular large language models (LLMs) correspond? We evaluate a variety of popular LLMs, including both commercial and open-source options, and discover that only GPT-4 achieves success (accuracy $\geq$ 60\%) across all six elementary school grades, while other models falter at different grade levels. Furthermore, we assess the robustness of several top-performing LLMs by augmenting the original problems in the CMATH dataset with distracting information. Our findings reveal that GPT-4 is able to maintains robustness, while other model fail. We anticipate that our study will expose limitations in LLMs' arithmetic and reasoning capabilities, and promote their ongoing development and advancement.  ( 2 min )
    Understanding Pathologies of Deep Heteroskedastic Regression. (arXiv:2306.16717v1 [stat.ML])
    Several recent studies have reported negative results when using heteroskedastic neural regression models to model real-world data. In particular, for overparameterized models, the mean and variance networks are powerful enough to either fit every single data point (while shrinking the predicted variances to zero), or to learn a constant prediction with an output variance exactly matching every predicted residual (i.e., explaining the targets as pure noise). This paper studies these difficulties from the perspective of statistical physics. We show that the observed instabilities are not specific to any neural network architecture but are already present in a field theory of an overparameterized conditional Gaussian likelihood model. Under light assumptions, we derive a nonparametric free energy that can be solved numerically. The resulting solutions show excellent qualitative agreement with empirical model fits on real-world data and, in particular, prove the existence of phase transitions, i.e., abrupt, qualitative differences in the behaviors of the regressors upon varying the regularization strengths on the two networks. Our work thus provides a theoretical explanation for the necessity to carefully regularize heteroskedastic regression models. Moreover, the insights from our theory suggest a scheme for optimizing this regularization which is quadratically more efficient than the naive approach.  ( 2 min )
    Envisioning a Next Generation Extended Reality Conferencing System with Efficient Photorealistic Human Rendering. (arXiv:2306.16541v1 [cs.CV])
    Meeting online is becoming the new normal. Creating an immersive experience for online meetings is a necessity towards more diverse and seamless environments. Efficient photorealistic rendering of human 3D dynamics is the core of immersive meetings. Current popular applications achieve real-time conferencing but fall short in delivering photorealistic human dynamics, either due to limited 2D space or the use of avatars that lack realistic interactions between participants. Recent advances in neural rendering, such as the Neural Radiance Field (NeRF), offer the potential for greater realism in metaverse meetings. However, the slow rendering speed of NeRF poses challenges for real-time conferencing. We envision a pipeline for a future extended reality metaverse conferencing system that leverages monocular video acquisition and free-viewpoint synthesis to enhance data and hardware efficiency. Towards an immersive conferencing experience, we explore an accelerated NeRF-based free-viewpoint synthesis algorithm for rendering photorealistic human dynamics more efficiently. We show that our algorithm achieves comparable rendering quality while performing training and inference 44.5% and 213% faster than state-of-the-art methods, respectively. Our exploration provides a design basis for constructing metaverse conferencing systems that can handle complex application scenarios, including dynamic scene relighting with customized themes and multi-user conferencing that harmonizes real-world people into an extended world.  ( 2 min )
    Allocating Divisible Resources on Arms with Unknown and Random Rewards. (arXiv:2306.16578v1 [cs.LG])
    We consider a decision maker allocating one unit of renewable and divisible resource in each period on a number of arms. The arms have unknown and random rewards whose means are proportional to the allocated resource and whose variances are proportional to an order $b$ of the allocated resource. In particular, if the decision maker allocates resource $A_i$ to arm $i$ in a period, then the reward $Y_i$ is$Y_i(A_i)=A_i \mu_i+A_i^b \xi_{i}$, where $\mu_i$ is the unknown mean and the noise $\xi_{i}$ is independent and sub-Gaussian. When the order $b$ ranges from 0 to 1, the framework smoothly bridges the standard stochastic multi-armed bandit and online learning with full feedback. We design two algorithms that attain the optimal gap-dependent and gap-independent regret bounds for $b\in [0,1]$, and demonstrate a phase transition at $b=1/2$. The theoretical results hinge on a novel concentration inequality we have developed that bounds a linear combination of sub-Gaussian random variables whose weights are fractional, adapted to the filtration, and monotonic.  ( 2 min )
    Realistic Synthetic Financial Transactions for Anti-Money Laundering Models. (arXiv:2306.16424v1 [cs.AI])
    With the widespread digitization of finance and the increasing popularity of cryptocurrencies, the sophistication of fraud schemes devised by cybercriminals is growing. Money laundering -- the movement of illicit funds to conceal their origins -- can cross bank and national boundaries, producing complex transaction patterns. The UN estimates 2-5\% of global GDP or \$0.8 - \$2.0 trillion dollars are laundered globally each year. Unfortunately, real data to train machine learning models to detect laundering is generally not available, and previous synthetic data generators have had significant shortcomings. A realistic, standardized, publicly-available benchmark is needed for comparing models and for the advancement of the area. To this end, this paper contributes a synthetic financial transaction dataset generator and a set of synthetically generated AML (Anti-Money Laundering) datasets. We have calibrated this agent-based generator to match real transactions as closely as possible and made the datasets public. We describe the generator in detail and demonstrate how the datasets generated can help compare different Graph Neural Networks in terms of their AML abilities. In a key way, using synthetic data in these comparisons can be even better than using real data: the ground truth labels are complete, whilst many laundering transactions in real data are never detected.  ( 2 min )
    Forecasting of the development of a partially-observed dynamical time series with the aid of time-invariance and linearity. (arXiv:2306.16593v1 [stat.ME])
    A dynamical system produces a dependent multivariate sequence called dynamical time series, developed with an evolution function. As variables in the dynamical time series at the current time-point usually depend on the whole variables in the previous time-point, existing studies forecast the variables at the future time-point by estimating the evolution function. However, some variables in the dynamical time-series are missing in some practical situations. In this study, we propose an autoregressive with slack time series (ARS) model. ARS model involves the simultaneous estimation of the evolution function and the underlying missing variables as a slack time series, with the aid of the time-invariance and linearity of the dynamical system. This study empirically demonstrates the effectiveness of the proposed ARS model.  ( 2 min )
    Neural networks can detect model-free static arbitrage strategies. (arXiv:2306.16422v1 [q-fin.CP])
    In this paper we demonstrate both theoretically as well as numerically that neural networks can detect model-free static arbitrage opportunities whenever the market admits some. Due to the use of neural networks, our method can be applied to financial markets with a high number of traded securities and ensures almost immediate execution of the corresponding trading strategies. To demonstrate its tractability, effectiveness, and robustness we provide examples using real financial data. From a technical point of view, we prove that a single neural network can approximately solve a class of convex semi-infinite programs, which is the key result in order to derive our theoretical results that neural networks can detect model-free static arbitrage strategies whenever the financial market admits such opportunities.  ( 2 min )
    Momentum Benefits Non-IID Federated Learning Simply and Provably. (arXiv:2306.16504v1 [cs.LG])
    Federated learning is a powerful paradigm for large-scale machine learning, but it faces significant challenges due to unreliable network connections, slow communication, and substantial data heterogeneity across clients. FedAvg and SCAFFOLD are two fundamental algorithms to address these challenges. In particular, FedAvg employs multiple local updates before communicating with a central server, while SCAFFOLD maintains a control variable on each client to compensate for "client drift" in its local updates. Various methods have been proposed in literature to enhance the convergence of these two algorithms, but they either make impractical adjustments to algorithmic structure, or rely on the assumption of bounded data heterogeneity. This paper explores the utilization of momentum to enhance the performance of FedAvg and SCAFFOLD. When all clients participate in the training process, we demonstrate that incorporating momentum allows FedAvg to converge without relying on the assumption of bounded data heterogeneity even using a constant local learning rate. This is a novel result since existing analyses for FedAvg require bounded data heterogeneity even with diminishing local learning rates. In the case of partial client participation, we show that momentum enables SCAFFOLD to converge provably faster without imposing any additional assumptions. Furthermore, we use momentum to develop new variance-reduced extensions of FedAvg and SCAFFOLD, which exhibit state-of-the-art convergence rates. Our experimental results support all theoretical findings.  ( 2 min )
    Prediction of Rapid Early Progression and Survival Risk with Pre-Radiation MRI in WHO Grade 4 Glioma Patients. (arXiv:2306.16531v1 [cs.LG])
    Recent clinical research describes a subset of glioblastoma patients that exhibit REP prior to start of radiation therapy. Current literature has thus far described this population using clinicopathologic features. To our knowledge, this study is the first to investigate the potential of conventional ra-diomics, sophisticated multi-resolution fractal texture features, and different molecular features (MGMT, IDH mutations) as a diagnostic and prognostic tool for prediction of REP from non-REP cases using computational and statistical modeling methods. Radiation-planning T1 post-contrast (T1C) MRI sequences of 70 patients are analyzed. Ensemble method with 5-fold cross validation over 1000 iterations offers AUC of 0.793 with standard deviation of 0.082 for REP and non-REP classification. In addition, copula-based modeling under dependent censoring (where a subset of the patients may not be followed up until death) identifies significant features (p-value <0.05) for survival probability and prognostic grouping of patient cases. The prediction of survival for the patients cohort produces precision of 0.881 with standard deviation of 0.056. The prognostic index (PI) calculated using the fused features suggests that 84.62% of REP cases fall under the bad prognostic group, suggesting potentiality of fused features to predict a higher percentage of REP cases. The experimental result further shows that mul-ti-resolution fractal texture features perform better than conventional radiomics features for REP and survival outcomes.  ( 3 min )
    SARC: Soft Actor Retrospective Critic. (arXiv:2306.16503v1 [cs.LG])
    The two-time scale nature of SAC, which is an actor-critic algorithm, is characterised by the fact that the critic estimate has not converged for the actor at any given time, but since the critic learns faster than the actor, it ensures eventual consistency between the two. Various strategies have been introduced in literature to learn better gradient estimates to help achieve better convergence. Since gradient estimates depend upon the critic, we posit that improving the critic can provide a better gradient estimate for the actor at each time. Utilizing this, we propose Soft Actor Retrospective Critic (SARC), where we augment the SAC critic loss with another loss term - retrospective loss - leading to faster critic convergence and consequently, better policy gradient estimates for the actor. An existing implementation of SAC can be easily adapted to SARC with minimal modifications. Through extensive experimentation and analysis, we show that SARC provides consistent improvement over SAC on benchmark environments. We plan to open-source the code and all experiment data at: https://github.com/sukritiverma1996/SARC.  ( 2 min )
    BinaryViT: Pushing Binary Vision Transformers Towards Convolutional Models. (arXiv:2306.16678v1 [cs.CV])
    With the increasing popularity and the increasing size of vision transformers (ViTs), there has been an increasing interest in making them more efficient and less computationally costly for deployment on edge devices with limited computing resources. Binarization can be used to help reduce the size of ViT models and their computational cost significantly, using popcount operations when the weights and the activations are in binary. However, ViTs suffer a larger performance drop when directly applying convolutional neural network (CNN) binarization methods or existing binarization methods to binarize ViTs compared to CNNs on datasets with a large number of classes such as ImageNet-1k. With extensive analysis, we find that binary vanilla ViTs such as DeiT miss out on a lot of key architectural properties that CNNs have that allow binary CNNs to have much higher representational capability than binary vanilla ViT. Therefore, we propose BinaryViT, in which inspired by the CNN architecture, we include operations from the CNN architecture into a pure ViT architecture to enrich the representational capability of a binary ViT without introducing convolutions. These include an average pooling layer instead of a token pooling layer, a block that contains multiple average pooling branches, an affine transformation right before the addition of each main residual connection, and a pyramid structure. Experimental results on the ImageNet-1k dataset show the effectiveness of these operations that allow a binary pure ViT model to be competitive with previous state-of-the-art (SOTA) binary CNN models.  ( 3 min )
    Stochastic Methods in Variational Inequalities: Ergodicity, Bias and Refinements. (arXiv:2306.16502v1 [stat.ML])
    For min-max optimization and variational inequalities problems (VIP) encountered in diverse machine learning tasks, Stochastic Extragradient (SEG) and Stochastic Gradient Descent Ascent (SGDA) have emerged as preeminent algorithms. Constant step-size variants of SEG/SGDA have gained popularity, with appealing benefits such as easy tuning and rapid forgiveness of initial conditions, but their convergence behaviors are more complicated even in rudimentary bilinear models. Our work endeavors to elucidate and quantify the probabilistic structures intrinsic to these algorithms. By recasting the constant step-size SEG/SGDA as time-homogeneous Markov Chains, we establish a first-of-its-kind Law of Large Numbers and a Central Limit Theorem, demonstrating that the average iterate is asymptotically normal with a unique invariant distribution for an extensive range of monotone and non-monotone VIPs. Specializing to convex-concave min-max optimization, we characterize the relationship between the step-size and the induced bias with respect to the Von-Neumann's value. Finally, we establish that Richardson-Romberg extrapolation can improve proximity of the average iterate to the global solution for VIPs. Our probabilistic analysis, underpinned by experiments corroborating our theoretical discoveries, harnesses techniques from optimization, Markov chains, and operator theory.  ( 2 min )
    Shilling Black-box Review-based Recommender Systems through Fake Review Generation. (arXiv:2306.16526v1 [cs.IR])
    Review-Based Recommender Systems (RBRS) have attracted increasing research interest due to their ability to alleviate well-known cold-start problems. RBRS utilizes reviews to construct the user and items representations. However, in this paper, we argue that such a reliance on reviews may instead expose systems to the risk of being shilled. To explore this possibility, in this paper, we propose the first generation-based model for shilling attacks against RBRSs. Specifically, we learn a fake review generator through reinforcement learning, which maliciously promotes items by forcing prediction shifts after adding generated reviews to the system. By introducing the auxiliary rewards to increase text fluency and diversity with the aid of pre-trained language models and aspect predictors, the generated reviews can be effective for shilling with high fidelity. Experimental results demonstrate that the proposed framework can successfully attack three different kinds of RBRSs on the Amazon corpus with three domains and Yelp corpus. Furthermore, human studies also show that the generated reviews are fluent and informative. Finally, equipped with Attack Review Generators (ARGs), RBRSs with adversarial training are much more robust to malicious reviews.  ( 2 min )
    Complex-valued Adaptive System Identification via Low-Rank Tensor Decomposition. (arXiv:2306.16428v1 [cs.LG])
    Machine learning (ML) and tensor-based methods have been of significant interest for the scientific community for the last few decades. In a previous work we presented a novel tensor-based system identification framework to ease the computational burden of tensor-only architectures while still being able to achieve exceptionally good performance. However, the derived approach only allows to process real-valued problems and is therefore not directly applicable on a wide range of signal processing and communications problems, which often deal with complex-valued systems. In this work we therefore derive two new architectures to allow the processing of complex-valued signals, and show that these extensions are able to surpass the trivial, complex-valued extension of the original architecture in terms of performance, while only requiring a slight overhead in computational resources to allow for complex-valued operations.  ( 2 min )
    Learning Fair Classifiers via Min-Max F-divergence Regularization. (arXiv:2306.16552v1 [cs.LG])
    As machine learning (ML) based systems are adopted in domains such as law enforcement, criminal justice, finance, hiring and admissions, ensuring the fairness of ML aided decision-making is becoming increasingly important. In this paper, we focus on the problem of fair classification, and introduce a novel min-max F-divergence regularization framework for learning fair classification models while preserving high accuracy. Our framework consists of two trainable networks, namely, a classifier network and a bias/fairness estimator network, where the fairness is measured using the statistical notion of F-divergence. We show that F-divergence measures possess convexity and differentiability properties, and their variational representation make them widely applicable in practical gradient based training methods. The proposed framework can be readily adapted to multiple sensitive attributes and for high dimensional datasets. We study the F-divergence based training paradigm for two types of group fairness constraints, namely, demographic parity and equalized odds. We present a comprehensive set of experiments for several real-world data sets arising in multiple domains (including COMPAS, Law Admissions, Adult Income, and CelebA datasets). To quantify the fairness-accuracy tradeoff, we introduce the notion of fairness-accuracy receiver operating characteristic (FA-ROC) and a corresponding \textit{low-bias} FA-ROC, which we argue is an appropriate measure to evaluate different classifiers. In comparison to several existing approaches for learning fair classifiers (including pre-processing, post-processing and other regularization methods), we show that the proposed F-divergence based framework achieves state-of-the-art performance with respect to the trade-off between accuracy and fairness.  ( 3 min )
    Towards a Better Theoretical Understanding of Independent Subnetwork Training. (arXiv:2306.16484v1 [cs.LG])
    Modern advancements in large-scale machine learning would be impossible without the paradigm of data-parallel distributed computing. Since distributed computing with large-scale models imparts excessive pressure on communication channels, significant recent research has been directed toward co-designing communication compression strategies and training algorithms with the goal of reducing communication costs. While pure data parallelism allows better data scaling, it suffers from poor model scaling properties. Indeed, compute nodes are severely limited by memory constraints, preventing further increases in model size. For this reason, the latest achievements in training giant neural network models also rely on some form of model parallelism. In this work, we take a closer theoretical look at Independent Subnetwork Training (IST), which is a recently proposed and highly effective technique for solving the aforementioned problems. We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication, and provide a precise analysis of its optimization performance on a quadratic model.  ( 2 min )
    A Food Recommender System in Academic Environments Based on Machine Learning Models. (arXiv:2306.16528v1 [cs.IR])
    Background: People's health depends on the use of proper diet as an important factor. Today, with the increasing mechanization of people's lives, proper eating habits and behaviors are neglected. On the other hand, food recommendations in the field of health have also tried to deal with this issue. But with the introduction of the Western nutrition style and the advancement of Western chemical medicine, many issues have emerged in the field of disease treatment and nutrition. Recent advances in technology and the use of artificial intelligence methods in information systems have led to the creation of recommender systems in order to improve people's health. Methods: A hybrid recommender system including, collaborative filtering, content-based, and knowledge-based models was used. Machine learning models such as Decision Tree, k-Nearest Neighbors (kNN), AdaBoost, and Bagging were investigated in the field of food recommender systems on 2519 students in the nutrition management system of a university. Student information including profile information for basal metabolic rate, student reservation records, and selected diet type is received online. Among the 15 features collected and after consulting nutrition experts, the most effective features are selected through feature engineering. Using machine learning models based on energy indicators and food selection history by students, food from the university menu is recommended to students. Results: The AdaBoost model has the highest performance in terms of accuracy with a rate of 73.70 percent. Conclusion: Considering the importance of diet in people's health, recommender systems are effective in obtaining useful information from a huge amount of data. Keywords: Recommender system, Food behavior and habits, Machine learning, Classification  ( 3 min )
    Long-Term Hourly Scenario Generation for Correlated Wind and Solar Power combining Variational Autoencoders with Radial Basis Function Kernels. (arXiv:2306.16427v1 [cs.LG])
    Accurate generation of realistic future scenarios of renewable energy generation is crucial for long-term planning and operation of electrical systems, especially considering the increasing focus on sustainable energy and the growing penetration of renewable generation in energy matrices. These predictions enable power system operators and energy planners to effectively manage the variability and intermittency associated with renewable generation, allowing for better grid stability, improved energy management, and enhanced decision-making processes. In this paper, we propose an innovative method for generating long-term hourly scenarios for wind and solar power generation, taking into consideration the correlation between these two energy sources. To achieve this, we combine the capabilities of a Variational Autoencoder (VAE) with the additional benefits of incorporating the Radial Basis Function (RBF) kernel in our artificial neural network architecture. By incorporating them, we aim to obtain a latent space with improved regularization properties. To evaluate the effectiveness of our proposed method, we conduct experiments in a representative study scenario, utilizing real-world wind and solar power generation data from the Brazil system. We compare the scenarios generated by our model with the observed data and with other sets of scenarios produced by a conventional VAE architecture. Our experimental results demonstrate that the proposed method can generate long-term hourly scenarios for wind and solar power generation that are highly correlated, accurately capturing the temporal and spatial characteristics of these energy sources. Taking advantage of the benefits of RBF in obtaining a well-regularized latent space, our approach offers improved accuracy and robustness in generating long-term hourly scenarios for renewable energy generation.  ( 3 min )
    For Kernel Range Spaces a Constant Number of Queries Are Sufficient. (arXiv:2306.16516v1 [cs.CG])
    We introduce the notion of an $\varepsilon$-cover for a kernel range space. A kernel range space concerns a set of points $X \subset \mathbb{R}^d$ and the space of all queries by a fixed kernel (e.g., a Gaussian kernel $K(p,\cdot) = \exp(-\|p-\cdot\|^2)$). For a point set $X$ of size $n$, a query returns a vector of values $R_p \in \mathbb{R}^n$, where the $i$th coordinate $(R_p)_i = K(p,x_i)$ for $x_i \in X$. An $\varepsilon$-cover is a subset of points $Q \subset \mathbb{R}^d$ so for any $p \in \mathbb{R}^d$ that $\frac{1}{n} \|R_p - R_q\|_1\leq \varepsilon$ for some $q \in Q$. This is a smooth analog of Haussler's notion of $\varepsilon$-covers for combinatorial range spaces (e.g., defined by subsets of points within a ball query) where the resulting vectors $R_p$ are in $\{0,1\}^n$ instead of $[0,1]^n$. The kernel versions of these range spaces show up in data analysis tasks where the coordinates may be uncertain or imprecise, and hence one wishes to add some flexibility in the notion of inside and outside of a query range. Our main result is that, unlike combinatorial range spaces, the size of kernel $\varepsilon$-covers is independent of the input size $n$ and dimension $d$. We obtain a bound of $(1/\varepsilon)^{\tilde O(1/\varepsilon^2)}$, where $\tilde{O}(f(1/\varepsilon))$ hides log factors in $(1/\varepsilon)$ that can depend on the kernel. This implies that by relaxing the notion of boundaries in range queries, eventually the curse of dimensionality disappears, and may help explain the success of machine learning in very high-dimensions. We also complement this result with a lower bound of almost $(1/\varepsilon)^{\Omega(1/\varepsilon)}$, showing the exponential dependence on $1/\varepsilon$ is necessary.  ( 3 min )
  • Open

    Would I have gotten that reward? Long-term credit assignment by counterfactual contribution analysis. (arXiv:2306.16803v1 [cs.LG])
    To make reinforcement learning more sample efficient, we need better credit assignment methods that measure an action's influence on future rewards. Building upon Hindsight Credit Assignment (HCA), we introduce Counterfactual Contribution Analysis (COCOA), a new family of model-based credit assignment algorithms. Our algorithms achieve precise credit assignment by measuring the contribution of actions upon obtaining subsequent rewards, by quantifying a counterfactual query: "Would the agent still have reached this reward if it had taken another action?". We show that measuring contributions w.r.t. rewarding states, as is done in HCA, results in spurious estimates of contributions, causing HCA to degrade towards the high-variance REINFORCE estimator in many relevant environments. Instead, we measure contributions w.r.t. rewards or learned representations of the rewarding objects, resulting in gradient estimates with lower variance. We run experiments on a suite of problems specifically designed to evaluate long-term credit assignment capabilities. By using dynamic programming, we measure ground-truth policy gradients and show that the improved performance of our new model-based credit assignment methods is due to lower bias and variance compared to HCA and common baselines. Our results demonstrate how modeling action contributions towards rewarding outcomes can be leveraged for credit assignment, opening a new path towards sample-efficient reinforcement learning.  ( 2 min )
    Understanding Pathologies of Deep Heteroskedastic Regression. (arXiv:2306.16717v1 [stat.ML])
    Several recent studies have reported negative results when using heteroskedastic neural regression models to model real-world data. In particular, for overparameterized models, the mean and variance networks are powerful enough to either fit every single data point (while shrinking the predicted variances to zero), or to learn a constant prediction with an output variance exactly matching every predicted residual (i.e., explaining the targets as pure noise). This paper studies these difficulties from the perspective of statistical physics. We show that the observed instabilities are not specific to any neural network architecture but are already present in a field theory of an overparameterized conditional Gaussian likelihood model. Under light assumptions, we derive a nonparametric free energy that can be solved numerically. The resulting solutions show excellent qualitative agreement with empirical model fits on real-world data and, in particular, prove the existence of phase transitions, i.e., abrupt, qualitative differences in the behaviors of the regressors upon varying the regularization strengths on the two networks. Our work thus provides a theoretical explanation for the necessity to carefully regularize heteroskedastic regression models. Moreover, the insights from our theory suggest a scheme for optimizing this regularization which is quadratically more efficient than the naive approach.  ( 2 min )
    Statistical Inference for High-Dimensional Linear Regression with Blockwise Missing Data. (arXiv:2106.03344v2 [stat.ME] UPDATED)
    Blockwise missing data occurs frequently when we integrate multisource or multimodality data where different sources or modalities contain complementary information. In this paper, we consider a high-dimensional linear regression model with blockwise missing covariates and a partially observed response variable. Under this framework, we propose a computationally efficient estimator for the regression coefficient vector based on carefully constructed unbiased estimating equations and a blockwise imputation procedure, and obtain its rate of convergence. Furthermore, building upon an innovative projected estimating equation technique that intrinsically achieves bias-correction of the initial estimator, we propose a nearly unbiased estimator for each individual regression coefficient, which is asymptotically normally distributed under mild conditions. Based on these debiased estimators, asymptotically valid confidence intervals and statistical tests about each regression coefficient are constructed. Numerical studies and application analysis of the Alzheimer's Disease Neuroimaging Initiative data show that the proposed method performs better and benefits more from unsupervised samples than existing methods.  ( 2 min )
    Joint Multi-view Unsupervised Feature Selection and Graph Learning. (arXiv:2204.08247v2 [cs.CV] UPDATED)
    Despite significant progress, previous multi-view unsupervised feature selection methods mostly suffer from two limitations. First, they generally utilize either cluster structure or similarity structure to guide the feature selection, which neglect the possibility of a joint formulation with mutual benefits. Second, they often learn the similarity structure by either global structure learning or local structure learning, which lack the capability of graph learning with both global and local structural awareness. In light of this, this paper presents a joint multi-view unsupervised feature selection and graph learning (JMVFG) approach. Particularly, we formulate the multi-view feature selection with orthogonal decomposition, where each target matrix is decomposed into a view-specific basis matrix and a view-consistent cluster indicator. The cross-space locality preservation is incorporated to bridge the cluster structure learning in the projected space and the similarity learning (i.e., graph learning) in the original space. Further, a unified objective function is presented to enable the simultaneous learning of the cluster structure, the global and local similarity structures, and the multi-view consistency and inconsistency, upon which an alternating optimization algorithm is developed with theoretically proved convergence. Extensive experiments on a variety of real-world multi-view datasets demonstrate the superiority of our approach for both the multi-view feature selection and graph learning tasks. The code is available at https://github.com/huangdonghere/JMVFG.  ( 3 min )
    Numerical Data Imputation for Multimodal Data Sets: A Probabilistic Nearest-Neighbor Kernel Density Approach. (arXiv:2306.16906v1 [stat.ML])
    Numerical data imputation algorithms replace missing values by estimates to leverage incomplete data sets. Current imputation methods seek to minimize the error between the unobserved ground truth and the imputed values. But this strategy can create artifacts leading to poor imputation in the presence of multimodal or complex distributions. To tackle this problem, we introduce the $k$NN$\times$KDE algorithm: a data imputation method combining nearest neighbor estimation ($k$NN) and density estimation with Gaussian kernels (KDE). We compare our method with previous data imputation methods using artificial and real-world data with different data missing scenarios and various data missing rates, and show that our method can cope with complex original data structure, yields lower data imputation errors, and provides probabilistic estimates with higher likelihood than current methods. We release the code in open-source for the community: https://github.com/DeltaFloflo/knnxkde  ( 2 min )
    Forecasting of the development of a partially-observed dynamical time series with the aid of time-invariance and linearity. (arXiv:2306.16593v1 [stat.ME])
    A dynamical system produces a dependent multivariate sequence called dynamical time series, developed with an evolution function. As variables in the dynamical time series at the current time-point usually depend on the whole variables in the previous time-point, existing studies forecast the variables at the future time-point by estimating the evolution function. However, some variables in the dynamical time-series are missing in some practical situations. In this study, we propose an autoregressive with slack time series (ARS) model. ARS model involves the simultaneous estimation of the evolution function and the underlying missing variables as a slack time series, with the aid of the time-invariance and linearity of the dynamical system. This study empirically demonstrates the effectiveness of the proposed ARS model.  ( 2 min )
    Allocating Divisible Resources on Arms with Unknown and Random Rewards. (arXiv:2306.16578v1 [cs.LG])
    We consider a decision maker allocating one unit of renewable and divisible resource in each period on a number of arms. The arms have unknown and random rewards whose means are proportional to the allocated resource and whose variances are proportional to an order $b$ of the allocated resource. In particular, if the decision maker allocates resource $A_i$ to arm $i$ in a period, then the reward $Y_i$ is$Y_i(A_i)=A_i \mu_i+A_i^b \xi_{i}$, where $\mu_i$ is the unknown mean and the noise $\xi_{i}$ is independent and sub-Gaussian. When the order $b$ ranges from 0 to 1, the framework smoothly bridges the standard stochastic multi-armed bandit and online learning with full feedback. We design two algorithms that attain the optimal gap-dependent and gap-independent regret bounds for $b\in [0,1]$, and demonstrate a phase transition at $b=1/2$. The theoretical results hinge on a novel concentration inequality we have developed that bounds a linear combination of sub-Gaussian random variables whose weights are fractional, adapted to the filtration, and monotonic.  ( 2 min )
    Learning Mixtures of Gaussians with Censored Data. (arXiv:2305.04127v2 [cs.LG] UPDATED)
    We study the problem of learning mixtures of Gaussians with censored data. Statistical learning with censored data is a classical problem, with numerous practical applications, however, finite-sample guarantees for even simple latent variable models such as Gaussian mixtures are missing. Formally, we are given censored data from a mixture of univariate Gaussians $$ \sum_{i=1}^k w_i \mathcal{N}(\mu_i,\sigma^2), $$ i.e. the sample is observed only if it lies inside a set $S$. The goal is to learn the weights $w_i$ and the means $\mu_i$. We propose an algorithm that takes only $\frac{1}{\varepsilon^{O(k)}}$ samples to estimate the weights $w_i$ and the means $\mu_i$ within $\varepsilon$ error.  ( 2 min )
    Learning thermodynamically constrained equations of state with uncertainty. (arXiv:2306.17004v1 [physics.data-an])
    Numerical simulations of high energy-density experiments require equation of state (EOS) models that relate a material's thermodynamic state variables -- specifically pressure, volume/density, energy, and temperature. EOS models are typically constructed using a semi-empirical parametric methodology, which assumes a physics-informed functional form with many tunable parameters calibrated using experimental/simulation data. Since there are inherent uncertainties in the calibration data (parametric uncertainty) and the assumed functional EOS form (model uncertainty), it is essential to perform uncertainty quantification (UQ) to improve confidence in the EOS predictions. Model uncertainty is challenging for UQ studies since it requires exploring the space of all possible physically consistent functional forms. Thus, it is often neglected in favor of parametric uncertainty, which is easier to quantify without violating thermodynamic laws. This work presents a data-driven machine learning approach to constructing EOS models that naturally captures model uncertainty while satisfying the necessary thermodynamic consistency and stability constraints. We propose a novel framework based on physics-informed Gaussian process regression (GPR) that automatically captures total uncertainty in the EOS and can be jointly trained on both simulation and experimental data sources. A GPR model for the shock Hugoniot is derived and its uncertainties are quantified using the proposed framework. We apply the proposed model to learn the EOS for the diamond solid state of carbon, using both density functional theory data and experimental shock Hugoniot data to train the model and show that the prediction uncertainty reduces by considering the thermodynamic constraints.  ( 3 min )
    Medoid splits for efficient random forests in metric spaces. (arXiv:2306.17031v1 [stat.ME])
    This paper revisits an adaptation of the random forest algorithm for Fr\'echet regression, addressing the challenge of regression in the context of random objects in metric spaces. Recognizing the limitations of previous approaches, we introduce a new splitting rule that circumvents the computationally expensive operation of Fr\'echet means by substituting with a medoid-based approach. We validate this approach by demonstrating its asymptotic equivalence to Fr\'echet mean-based procedures and establish the consistency of the associated regression estimator. The paper provides a sound theoretical framework and a more efficient computational approach to Fr\'echet regression, broadening its application to non-standard data types and complex use cases.  ( 2 min )
    Temporal Robustness against Data Poisoning. (arXiv:2302.03684v2 [cs.LG] UPDATED)
    Data poisoning considers cases when an adversary manipulates the behavior of machine learning algorithms through malicious training data. Existing threat models of data poisoning center around a single metric, the number of poisoned samples. In consequence, if attackers can poison more samples than expected with affordable overhead, as in many practical scenarios, they may be able to render existing defenses ineffective in a short time. To address this issue, we leverage timestamps denoting the birth dates of data, which are often available but neglected in the past. Benefiting from these timestamps, we propose a temporal threat model of data poisoning with two novel metrics, earliness and duration, which respectively measure how long an attack started in advance and how long an attack lasted. Using these metrics, we define the notions of temporal robustness against data poisoning, providing a meaningful sense of protection even with unbounded amounts of poisoned samples. We present a benchmark with an evaluation protocol simulating continuous data collection and periodic deployments of updated models, thus enabling empirical evaluation of temporal robustness. Lastly, we develop and also empirically verify a baseline defense, namely temporal aggregation, offering provable temporal robustness and highlighting the potential of our temporal threat model for data poisoning.  ( 2 min )
    Solving Kernel Ridge Regression with Gradient-Based Optimization Methods. (arXiv:2306.16838v1 [stat.ML])
    Kernel ridge regression, KRR, is a non-linear generalization of linear ridge regression. Here, we introduce an equivalent formulation of the objective function of KRR, opening up both for using other penalties than the ridge penalty and for studying kernel ridge regression from the perspective of gradient descent. Using a continuous-time perspective, we derive a closed-form solution, kernel gradient flow, KGF, with regularization through early stopping, which allows us to theoretically bound the differences between KGF and KRR. We generalize KRR by replacing the ridge penalty with the $\ell_1$ and $\ell_\infty$ penalties and utilize the fact that analogously to the similarities between KGF and KRR, the solutions obtained when using these penalties are very similar to those obtained from forward stagewise regression (also known as coordinate descent) and sign gradient descent in combination with early stopping. Thus the need for computationally heavy proximal gradient descent algorithms can be alleviated. We show theoretically and empirically how these penalties, and corresponding gradient-based optimization algorithms, produce signal-driven and robust regression solutions, respectively. We also investigate kernel gradient descent where the kernel is allowed to change during training, and theoretically address the effects this has on generalization. Based on our findings, we propose an update scheme for the bandwidth of translational-invariant kernels, where we let the bandwidth decrease to zero during training, thus circumventing the need for hyper-parameter selection. We demonstrate on real and synthetic data how decreasing the bandwidth during training outperforms using a constant bandwidth, selected by cross-validation and marginal likelihood maximization. We also show that using a decreasing bandwidth, we are able to achieve both zero training error and a double descent behavior.  ( 3 min )
    Differentially Private Algorithms for the Stochastic Saddle Point Problem with Optimal Rates for the Strong Gap. (arXiv:2302.12909v2 [cs.LG] UPDATED)
    We show that convex-concave Lipschitz stochastic saddle point problems (also known as stochastic minimax optimization) can be solved under the constraint of $(\epsilon,\delta)$-differential privacy with \emph{strong (primal-dual) gap} rate of $\tilde O\big(\frac{1}{\sqrt{n}} + \frac{\sqrt{d}}{n\epsilon}\big)$, where $n$ is the dataset size and $d$ is the dimension of the problem. This rate is nearly optimal, based on existing lower bounds in differentially private stochastic optimization. Specifically, we prove a tight upper bound on the strong gap via novel implementation and analysis of the recursive regularization technique repurposed for saddle point problems. We show that this rate can be attained with $O\big(\min\big\{\frac{n^2\epsilon^{1.5}}{\sqrt{d}}, n^{3/2}\big\}\big)$ gradient complexity, and $\tilde{O}(n)$ gradient complexity if the loss function is smooth. As a byproduct of our method, we develop a general algorithm that, given a black-box access to a subroutine satisfying a certain $\alpha$ primal-dual accuracy guarantee with respect to the empirical objective, gives a solution to the stochastic saddle point problem with a strong gap of $\tilde{O}(\alpha+\frac{1}{\sqrt{n}})$. We show that this $\alpha$-accuracy condition is satisfied by standard algorithms for the empirical saddle point problem such as the proximal point method and the stochastic gradient descent ascent algorithm. Further, we show that even for simple problems it is possible for an algorithm to have zero weak gap and suffer from $\Omega(1)$ strong gap. We also show that there exists a fundamental tradeoff between stability and accuracy. Specifically, we show that any $\Delta$-stable algorithm has empirical gap $\Omega\big(\frac{1}{\Delta n}\big)$, and that this bound is tight. This result also holds also more specifically for empirical risk minimization problems and may be of independent interest.  ( 3 min )
    Finite-Sample Symmetric Mean Estimation with Fisher Information Rate. (arXiv:2306.16573v1 [math.ST])
    The mean of an unknown variance-$\sigma^2$ distribution $f$ can be estimated from $n$ samples with variance $\frac{\sigma^2}{n}$ and nearly corresponding subgaussian rate. When $f$ is known up to translation, this can be improved asymptotically to $\frac{1}{n\mathcal I}$, where $\mathcal I$ is the Fisher information of the distribution. Such an improvement is not possible for general unknown $f$, but [Stone, 1975] showed that this asymptotic convergence $\textit{is}$ possible if $f$ is $\textit{symmetric}$ about its mean. Stone's bound is asymptotic, however: the $n$ required for convergence depends in an unspecified way on the distribution $f$ and failure probability $\delta$. In this paper we give finite-sample guarantees for symmetric mean estimation in terms of Fisher information. For every $f, n, \delta$ with $n > \log \frac{1}{\delta}$, we get convergence close to a subgaussian with variance $\frac{1}{n \mathcal I_r}$, where $\mathcal I_r$ is the $r$-$\textit{smoothed}$ Fisher information with smoothing radius $r$ that decays polynomially in $n$. Such a bound essentially matches the finite-sample guarantees in the known-$f$ setting.  ( 2 min )
    Understanding Graph Neural Networks with Generalized Geometric Scattering Transforms. (arXiv:1911.06253v5 [stat.ML] UPDATED)
    The scattering transform is a multilayered wavelet-based deep learning architecture that acts as a model of convolutional neural networks. Recently, several works have introduced generalizations of the scattering transform for non-Euclidean settings such as graphs. Our work builds upon these constructions by introducing windowed and non-windowed geometric scattering transforms for graphs based upon a very general class of asymmetric wavelets. We show that these asymmetric graph scattering transforms have many of the same theoretical guarantees as their symmetric counterparts. As a result, the proposed construction unifies and extends known theoretical results for many of the existing graph scattering architectures. In doing so, this work helps bridge the gap between geometric scattering and other graph neural networks by introducing a large family of networks with provable stability and invariance guarantees. These results lay the groundwork for future deep learning architectures for graph-structured data that have learned filters and also provably have desirable theoretical properties.  ( 2 min )
    Provable Advantage of Curriculum Learning on Parity Targets with Mixed Inputs. (arXiv:2306.16921v1 [cs.LG])
    Experimental results have shown that curriculum learning, i.e., presenting simpler examples before more complex ones, can improve the efficiency of learning. Some recent theoretical results also showed that changing the sampling distribution can help neural networks learn parities, with formal results only for large learning rates and one-step arguments. Here we show a separation result in the number of training steps with standard (bounded) learning rates on a common sample distribution: if the data distribution is a mixture of sparse and dense inputs, there exists a regime in which a 2-layer ReLU neural network trained by a curriculum noisy-GD (or SGD) algorithm that uses sparse examples first, can learn parities of sufficiently large degree, while any fully connected neural network of possibly larger width or depth trained by noisy-GD on the unordered samples cannot learn without additional steps. We also provide experimental results supporting the qualitative separation beyond the specific regime of the theoretical results.  ( 2 min )
    Gradient flows on graphons: existence, convergence, continuity equations. (arXiv:2111.09459v3 [math.PR] UPDATED)
    Wasserstein gradient flows on probability measures have found a host of applications in various optimization problems. They typically arise as the continuum limit of exchangeable particle systems evolving by some mean-field interaction involving a gradient-type potential. However, in many problems, such as in multi-layer neural networks, the so-called particles are edge weights on large graphs whose nodes are exchangeable. Such large graphs are known to converge to continuum limits called graphons as their size grow to infinity. We show that the Euclidean gradient flow of a suitable function of the edge-weights converges to a novel continuum limit given by a curve on the space of graphons that can be appropriately described as a gradient flow or, more technically, a curve of maximal slope. Several natural functions on graphons, such as homomorphism functions and the scalar entropy, are covered by our set-up, and the examples have been worked out in detail.  ( 2 min )
    Local Risk Bounds for Statistical Aggregation. (arXiv:2306.17151v1 [math.ST])
    In the problem of aggregation, the aim is to combine a given class of base predictors to achieve predictions nearly as accurate as the best one. In this flexible framework, no assumption is made on the structure of the class or the nature of the target. Aggregation has been studied in both sequential and statistical contexts. Despite some important differences between the two problems, the classical results in both cases feature the same global complexity measure. In this paper, we revisit and tighten classical results in the theory of aggregation in the statistical setting by replacing the global complexity with a smaller, local one. Some of our proofs build on the PAC-Bayes localization technique introduced by Catoni. Among other results, we prove localized versions of the classical bound for the exponential weights estimator due to Leung and Barron and deviation-optimal bounds for the Q-aggregation estimator. These bounds improve over the results of Dai, Rigollet and Zhang for fixed design regression and the results of Lecu\'e and Rigollet for random design regression.  ( 2 min )
    Lossy Image Compression with Conditional Diffusion Models. (arXiv:2209.06950v5 [eess.IV] UPDATED)
    This paper outlines an end-to-end optimized lossy image compression framework using diffusion generative models. The approach relies on the transform coding paradigm, where an image is mapped into a latent space for entropy coding and, from there, mapped back to the data space for reconstruction. In contrast to VAE-based neural compression, where the (mean) decoder is a deterministic neural network, our decoder is a conditional diffusion model. Our approach thus introduces an additional "content" latent variable on which the reverse diffusion process is conditioned and uses this variable to store information about the image. The remaining "texture" variables characterizing the diffusion process are synthesized at decoding time. We show that the model's performance can be tuned toward perceptual metrics of interest. Our extensive experiments involving multiple datasets and image quality assessment metrics show that our approach yields stronger reported FID scores than the GAN-based model, while also yielding competitive performance with VAE-based models in several distortion metrics. Furthermore, training the diffusion with X-parameterization enables high-quality reconstructions in only a handful of decoding steps, greatly affecting the model's practicality.  ( 2 min )
    Efficient and Multiply Robust Risk Estimation under General Forms of Dataset Shift. (arXiv:2306.16406v2 [stat.ME] UPDATED)
    Statistical machine learning methods often face the challenge of limited data available from the population of interest. One remedy is to leverage data from auxiliary source populations, which share some conditional distributions or are linked in other ways with the target domain. Techniques leveraging such \emph{dataset shift} conditions are known as \emph{domain adaptation} or \emph{transfer learning}. Despite extensive literature on dataset shift, limited works address how to efficiently use the auxiliary populations to improve the accuracy of risk evaluation for a given machine learning task in the target population. In this paper, we study the general problem of efficiently estimating target population risk under various dataset shift conditions, leveraging semiparametric efficiency theory. We consider a general class of dataset shift conditions, which includes three popular conditions -- covariate, label and concept shift -- as special cases. We allow for partially non-overlapping support between the source and target populations. We develop efficient and multiply robust estimators along with a straightforward specification test of these dataset shift conditions. We also derive efficiency bounds for two other dataset shift conditions, posterior drift and location-scale shift. Simulation studies support the efficiency gains due to leveraging plausible dataset shift conditions.  ( 2 min )
    Safe Model-Based Multi-Agent Mean-Field Reinforcement Learning. (arXiv:2306.17052v1 [cs.LG])
    Many applications, e.g., in shared mobility, require coordinating a large number of agents. Mean-field reinforcement learning addresses the resulting scalability challenge by optimizing the policy of a representative agent. In this paper, we address an important generalization where there exist global constraints on the distribution of agents (e.g., requiring capacity constraints or minimum coverage requirements to be met). We propose Safe-$\text{M}^3$-UCRL, the first model-based algorithm that attains safe policies even in the case of unknown transition dynamics. As a key ingredient, it uses epistemic uncertainty in the transition model within a log-barrier approach to ensure pessimistic constraints satisfaction with high probability. We showcase Safe-$\text{M}^3$-UCRL on the vehicle repositioning problem faced by many shared mobility operators and evaluate its performance through simulations built on Shenzhen taxi trajectory data. Our algorithm effectively meets the demand in critical areas while ensuring service accessibility in regions with low demand.  ( 2 min )
    Stochastic Methods in Variational Inequalities: Ergodicity, Bias and Refinements. (arXiv:2306.16502v1 [stat.ML])
    For min-max optimization and variational inequalities problems (VIP) encountered in diverse machine learning tasks, Stochastic Extragradient (SEG) and Stochastic Gradient Descent Ascent (SGDA) have emerged as preeminent algorithms. Constant step-size variants of SEG/SGDA have gained popularity, with appealing benefits such as easy tuning and rapid forgiveness of initial conditions, but their convergence behaviors are more complicated even in rudimentary bilinear models. Our work endeavors to elucidate and quantify the probabilistic structures intrinsic to these algorithms. By recasting the constant step-size SEG/SGDA as time-homogeneous Markov Chains, we establish a first-of-its-kind Law of Large Numbers and a Central Limit Theorem, demonstrating that the average iterate is asymptotically normal with a unique invariant distribution for an extensive range of monotone and non-monotone VIPs. Specializing to convex-concave min-max optimization, we characterize the relationship between the step-size and the induced bias with respect to the Von-Neumann's value. Finally, we establish that Richardson-Romberg extrapolation can improve proximity of the average iterate to the global solution for VIPs. Our probabilistic analysis, underpinned by experiments corroborating our theoretical discoveries, harnesses techniques from optimization, Markov chains, and operator theory.  ( 2 min )
    Trajectory Poisson multi-Bernoulli mixture filter for traffic monitoring using a drone. (arXiv:2306.16890v1 [cs.CV])
    This paper proposes a multi-object tracking (MOT) algorithm for traffic monitoring using a drone equipped with optical and thermal cameras. Object detections on the images are obtained using a neural network for each type of camera. The cameras are modelled as direction-of-arrival (DOA) sensors. Each DOA detection follows a von-Mises Fisher distribution, whose mean direction is obtain by projecting a vehicle position on the ground to the camera. We then use the trajectory Poisson multi-Bernoulli mixture filter (TPMBM), which is a Bayesian MOT algorithm, to optimally estimate the set of vehicle trajectories. We have also developed a parameter estimation algorithm for the measurement model. We have tested the accuracy of the resulting TPMBM filter in synthetic and experimental data sets.  ( 2 min )
    Likelihood-free neural Bayes estimators for censored inference with peaks-over-threshold models. (arXiv:2306.15642v2 [stat.ME] UPDATED)
    Inference for spatial extremal dependence models can be computationally burdensome in moderate-to-high dimensions due to their reliance on intractable and/or censored likelihoods. Exploiting recent advances in likelihood-free inference with neural Bayes estimators (that is, neural estimators that target Bayes estimators), we develop a novel approach to construct highly efficient estimators for censored peaks-over-threshold models by encoding censoring information in the neural network architecture. Our new method provides a paradigm shift that challenges traditional censored likelihood-based inference for spatial extremes. Our simulation studies highlight significant gains in both computational and statistical efficiency, relative to competing likelihood-based approaches, when applying our novel estimators for inference of popular extremal dependence models, such as max-stable, $r$-Pareto, and random scale mixture processes. We also illustrate that it is possible to train a single estimator for a general censoring level, obviating the need to retrain when the censoring level is changed. We illustrate the efficacy of our estimators by making fast inference on hundreds-of-thousands of high-dimensional spatial extremal dependence models to assess particulate matter 2.5 microns or less in diameter (PM2.5) concentration over the whole of Saudi Arabia.  ( 2 min )
    Automatic Calibration and Error Correction for Large Language Models via Pareto Optimal Self-Supervision. (arXiv:2306.16564v1 [cs.CL])
    Large language models (LLMs) have demonstrated remarkable capabilities out of box for a wide range of applications, yet accuracy still remains a major growth area, especially in mission-critical domains such as biomedicine. An effective method to calibrate the confidence level on LLM responses is essential to automatically detect errors and facilitate human-in-the-loop verification. An important source of calibration signals stems from expert-stipulated programmatic supervision, which is often available at low cost but has its own limitations such as noise and coverage. In this paper, we introduce a Pareto optimal self-supervision framework that can leverage available programmatic supervision to systematically calibrate LLM responses by producing a risk score for every response, without any additional manual efforts. This is accomplished by learning a harmonizer model to align LLM output with other available supervision sources, which would assign higher risk scores to more uncertain LLM responses and facilitate error correction. Experiments on standard relation extraction tasks in biomedical and general domains demonstrate the promise of this approach, with our proposed risk scores highly correlated with the real error rate of LLMs. For the most uncertain test instances, dynamic prompting based on our proposed risk scores results in significant accuracy improvement for off-the-shelf LLMs, boosting GPT-3 results past state-of-the-art (SOTA) weak supervision and GPT-4 results past SOTA supervised results on challenging evaluation datasets.  ( 2 min )
    The Local Approach to Causal Inference under Network Interference. (arXiv:2105.03810v4 [econ.EM] UPDATED)
    We propose a new nonparametric modeling framework for causal inference when outcomes depend on how agents are linked in a social or economic network. Such network interference describes a large literature on treatment spillovers, social interactions, social learning, information diffusion, disease and financial contagion, social capital formation, and more. Our approach works by first characterizing how an agent is linked in the network using the configuration of other agents and connections nearby as measured by path distance. The impact of a policy or treatment assignment is then learned by pooling outcome data across similarly configured agents. We demonstrate the approach by proposing an asymptotically valid test for the hypothesis of policy irrelevance/no treatment effects and bounding the mean-squared error of a k-nearest-neighbor estimator for the average or distributional policy effect/treatment response.  ( 2 min )
    Causal inference for the expected number of recurrent events in the presence of a terminal event. (arXiv:2306.16571v1 [stat.ME])
    We study causal inference and efficient estimation for the expected number of recurrent events in the presence of a terminal event. We define our estimand as the vector comprising both the expected number of recurrent events and the failure survival function evaluated along a sequence of landmark times. We identify the estimand in the presence of right-censoring and causal selection as an observed data functional under coarsening at random, derive the nonparametric efficiency bound, and propose a multiply-robust estimator that achieves the bound and permits nonparametric estimation of nuisance parameters. Throughout, no absolute continuity assumption is made on the underlying probability distributions of failure, censoring, or the observed data. Additionally, we derive the class of influence functions when the coarsening distribution is known and review how published estimators may belong to the class. Along the way, we highlight some interesting inconsistencies in the causal lifetime analysis literature.  ( 2 min )
    Superiority of GNN over NN in generalizing bandlimited functions. (arXiv:2206.05904v7 [cs.LG] UPDATED)
    Graph Neural Network (GNN) with its ability to integrate graph information has been widely used for data analyses. However, the expressive power of GNN has only been studied for graph-level tasks but not for node-level tasks, such as node classification, where one tries to interpolate missing nodal labels from the observed ones. In this paper, we study the expressive power of GNN for the said classification task, which is in essence a function interpolation problem. Explicitly, we derive the number of weights and layers needed for a GNN to interpolate a band-limited function in $\mathbb{R}^d$. Our result shows that, the number of weights needed to $\epsilon$-approximate a bandlimited function using the GNN architecture is much fewer than the best known one using a fully connected neural network (NN) - in particular, one only needs $O((\log \epsilon^{-1})^{d})$ weights using a GNN trained by $O((\log \epsilon^{-1})^{d})$ samples to $\epsilon$-approximate a discretized bandlimited signal in $\mathbb{R}^d$. The result is obtained by drawing a connection between the GNN structure and the classical sampling theorems, making our work the first attempt in this direction.  ( 3 min )
    UTOPIA: Universally Trainable Optimal Prediction Intervals Aggregation. (arXiv:2306.16549v1 [stat.ME])
    Uncertainty quantification for prediction is an intriguing problem with significant applications in various fields, such as biomedical science, economic studies, and weather forecasts. Numerous methods are available for constructing prediction intervals, such as quantile regression and conformal predictions, among others. Nevertheless, model misspecification (especially in high-dimension) or sub-optimal constructions can frequently result in biased or unnecessarily-wide prediction intervals. In this paper, we propose a novel and widely applicable technique for aggregating multiple prediction intervals to minimize the average width of the prediction band along with coverage guarantee, called Universally Trainable Optimal Predictive Intervals Aggregation (UTOPIA). The method also allows us to directly construct predictive bands based on elementary basis functions. Our approach is based on linear or convex programming which is easy to implement. All of our proposed methodologies are supported by theoretical guarantees on the coverage probability and optimal average length, which are detailed in this paper. The effectiveness of our approach is convincingly demonstrated by applying it to synthetic data and two real datasets on finance and macroeconomics.  ( 2 min )
    Neural networks can detect model-free static arbitrage strategies. (arXiv:2306.16422v1 [q-fin.CP])
    In this paper we demonstrate both theoretically as well as numerically that neural networks can detect model-free static arbitrage opportunities whenever the market admits some. Due to the use of neural networks, our method can be applied to financial markets with a high number of traded securities and ensures almost immediate execution of the corresponding trading strategies. To demonstrate its tractability, effectiveness, and robustness we provide examples using real financial data. From a technical point of view, we prove that a single neural network can approximately solve a class of convex semi-infinite programs, which is the key result in order to derive our theoretical results that neural networks can detect model-free static arbitrage strategies whenever the financial market admits such opportunities.  ( 2 min )

  • Open

    I love this art style!
    ​ https://preview.redd.it/nkqhl3ynn89b1.png?width=768&format=png&auto=webp&s=9030640323813257a97f256fd8a67f1a492d7e9a Steps to replicate: Typed the following on Nack Ai: /imagine ((A full body portrait of a handsome 18 year old boy with an afro, in trendy black hoody)) ((Disney style animation, handsome face, rounded eyes, digital painting, fine exquisite details, fan art)) ((Background is a beautiful well lit cyberpunk alley)) ((graffiti on walls)) maximalist, cartoon-style.🌃🌃 (cgi-style)(ray-tracing lighting)((splashed multicolored paint))((multicolor paint dripping)), gamer headphones 🎧 Fantastical special effects in the background. ((Multicolor paint streaks)) rainbow prism bubbles floating in the air. | negate: | width: 768 | height: 1024 | images: 4 | prompt_weight: 7.0 | steps: 60 | model: Dreamshaper v6 | prompt_enhancement: Yes submitted by /u/rich_awo [link] [comments]  ( 8 min )
    Is there a AI music generator that can produce music similar to a particular song?
    For example, if I want to create a song similar to Fireflies by Owl City to use with a YouTube video. Not a remix but a different song inspired by another. submitted by /u/createdtexan [link] [comments]  ( 8 min )
    What is the most modern technique for generating a video from an image+driver video?
    NVIDEA's Maxine and Live Portrait microservices look very promising but they are still in early access. But if you dig into Live Portrait it seems to be built off of a three year old github repo. Example from the repo: ​ https://i.redd.it/emp3xe70s79b1.gif Another big one I have found is the first-order-model repo which is a little older and seems to do about the same kind of thing: ​ https://i.redd.it/cb8z7xf1s79b1.gif I would think there would be a lot of advancements in the last three years. Are these the best libraries to use if you want something off the shelf? Is there a name for this particular type of technology or class of algorithms? submitted by /u/blender_noob_5000 [link] [comments]  ( 8 min )
    AI — weekly megathread!
    This week in AI - partnered with aibrews.com feel free to follow their newsletter News & Insights Microsoft has launched AI-powered shopping tools in Bing search and Edge, including AI-generated buying guides which automatically aggregate product specifications and purchase locations for user queries​, and AI-generated review summaries that provide concise overviews of online product reviews [Details]. Salesforce AI Research released XGen-7B, a new open-source 7B LLM trained on 8K input sequence length for 1.5T tokens [Details| Huggingface| GitHub]. Researchers present DreamDiffusion, a novel method for generating high-quality images directly from brain EEG signals without the need to translate thoughts into text [Paper]. Google announced the first Machine Unlearning Challenge hoste…  ( 10 min )
    Is there an AI tool for creating/combining a note?
    So I just got a task from my boss to create a note containing specific details/checkmarks. Some context: I work in sales (telecommunications) and we use a note where we can write down specific information, for example how many devices does a costumer use, how many people are using the wi-fi connection etc. Now he asked me if I could take the current note we use (it’s a double-sided DIN A4 paper) and add some new information/checkmarks on it, so that the employees ask all the relevant questions. Now my question is, is there an AI tool which is able to create/combine a PDF based on the content I’ll provide to it (in form of a pdf, image or text)? submitted by /u/FlowerpotxD [link] [comments]  ( 8 min )
    I'm looking for a comic which shows a female warrior illegally infiltrating a AI facility where a lot of humans are hooked into machines in the style of matrix numb with pleasure and she chooses to do the same
    I am trying to find it but I can't. Does anyone have it? It had multiple panels of the story just like in a comic book submitted by /u/br_shadow [link] [comments]  ( 8 min )
    Talk to me! (Conversational AI in adult VR game)
    submitted by /u/VR_hot [link] [comments]  ( 8 min )
    AI Psychedelic Jewelry [workflow in comments]
    submitted by /u/PeePeePeePooPooPooo [link] [comments]  ( 8 min )
    Is this feasible?
    I’ve been thinking a lot lately about social media (Meta) and other large tech companies (Google) that profit off of our screen time with shown ads, and collecting data to sell. I know this is not a new topic, but I had an idea as to go on offense, instead of just defense with engines like Duck duck go.. Is it possible with AI to build an app that automatically searches, clicks, interacts with content on a social media site, or performs searches and I interactions on a search engine? Ideally a person could set the perimeters eg. kittens, how to xyz, rainbows, muscle cars, etc. this could run in the background while we are sleeping and the device is charging. After a time, the algorithm would produce ads catered to these searches as the profile these tech companies build on us start to morph into whatever we pick. As they do morph, the value proposition they have to sell our data to advertisers lessons as the integrity of the data falls apart. These companies as many know apply dark tactics and other psychological tactics to keep us engaged and disconnected from the real world. Thoughts on if this sort of program could be possible? submitted by /u/Pristine-Balance1827 [link] [comments]  ( 9 min )
    Snoopdog announces the cage match between Elon and Mark | Midjourney + Charactr API
    submitted by /u/3nd4u [link] [comments]  ( 8 min )
    A model which allows itself to learn?
    I downloaded my DNA from a company where I made an ancestry test, was wondering if GPT-4 would be able to find health issues or anything useful from it, obviously it is too long as an input and got an error. Now I have this question, let's say that GPT-4 did not have any idea how to do it, what model would be able to get something as a request then understand that it can not do that, then trigger something in it which starts gathering and making new neurons when necessary and new connections with the appropriate neurons so that it eventually can make sense of what I ask. submitted by /u/NotLegal69 [link] [comments]  ( 8 min )
    AI system devises first optimizations to sorting code in over a decade
    submitted by /u/eberkut [link] [comments]  ( 8 min )
    How close are we to developing a 'conscience' for AI?
    Are there any companies/startups/researchers working on this problem? Any good solutions proposed? It seems like somebody must be close to it by now. submitted by /u/medphysics [link] [comments]  ( 8 min )
    Meta explains the AI behind its social media algorithms
    submitted by /u/dahmedahe [link] [comments]  ( 8 min )
    Captivating Symbiosis: Exploring the Enigmatic Dance of Human Creativity and Artificial Intelligence
    I hope you're all having a fantastic day filled with mind-boggling insights! Today, I want to delve into the mesmerizing realm where human creativity intertwines with the vast potential of artificial intelligence. While the concept of AI has been at the forefront of our discussions, it's time to shift our focus and explore the profound symbiosis between human artists and their AI counterparts. As we witness the remarkable advancements in AI-generated art, it's easy to get lost in the magic that unfolds on our screens. With every brushstroke, every note played, and every word written, AI algorithms have been infiltrating the creative sphere, making their mark in ways we couldn't have imagined before. One could argue that AI art is a mere mimicry of human expression, lacking the profound e…  ( 9 min )
    i wonder if language models like phi-1 with quality training data could be used to train better language models by using it's emergent properties alone
    https://the-decoder.com/microsofts-tiny-phi-1-language-model-shows-the-importance-of-data-quality-in-ai-training/ ​ submitted by /u/loopy_fun [link] [comments]  ( 8 min )
    Pick Any 2 TV Shows To Crossover
    Since AI is getting quite popular, I can see it being useful for people to generate content which they aren’t getting. For instance.. I was thinking about a Grey’s Anatomy crossover with Desperate Housewives. Since that’s never likely going to happen, perhaps AI can generate something! Which shows or even characters would you like to see crossover? I’d also like to throw out the Z Nation vs Walking Dead teams doing a brawl between the two in the style of ‘Deadliest Warrior’ submitted by /u/parryfinkle [link] [comments]  ( 8 min )
    Where would I find someone who I could pay to make a fake audio for me?
    I’m trying to make a realistic sounding audio using eleven labs, but the voice sounds a little fake and I want there to be background noises over the audio like wind and clothing rustling, how can I get this done? Thanks! submitted by /u/drydrip126 [link] [comments]  ( 8 min )
  • Open

    DJL with RL?
    Has anyone used the Deep Java Library with RL? I found an example (TicTacToe), but it doesn't quite work as the example is from an old (0.60) version and the libraries are much newer. I'm playing with the library (I really prefer statically typed languages), but having trouble finding docs related to RL. If anyone has experience or can point me at some resources that discuss using DJL with RL, that'd be great. submitted by /u/scprotz [link] [comments]  ( 8 min )
    Is there any experience replay sampling method that takes into consideration frequency?
    Hello, I am researching the compression of convolutional neural networks using reinforcement learning. I use images of the dataset to compress each of the layers in a model. One of the issues I am facing is that it is getting stuck in local minima due to randomness and because the datasets have 80k images. I select a random batch of images for every test as using all 80k images consumes too much time (I fear this also might be the cause of my issue). I have tried some agents like Dueling DQN, DDPG and PPO. For all of them, I see that it gets stuck despite having explored better actions. I implemented a prioritized experience replay (PER) so that samples that have higher temporal difference errors have a higher probability of being selected. Nonetheless, I see that the Dueling DQN never chose the best sequence of actions despite it being found during exploration. Only 3 actions are taken so the episode has a short length. I was thinking about using Upper Confidence Bound every few epochs to select the best actions that might not be selected due to having low td error. Any thoughts on this? Also, if anyone would be kind enough to point out if there is any experience replay that takes into consideration the number of times each sample was selected, I would greatly appreciate it. Thanks in advance. ​ ​ submitted by /u/ElvishChampion [link] [comments]  ( 9 min )
    DDPG fails to learn a seemingly simple 2D room navigation problem
    Hello everyone, I am currently working on training a DDPG (Deep Deterministic Policy Gradient) agent for a navigation problem in a 2D room. The room is a square measuring 30m x 30m, with a star representing the central location. The goal is to control two players within the room using DDPG. Each player can move a fixed distance of 0.5m in any direction, and the DDPG actor network outputs two angles (ranging from 0 to 2*pi) for each player, representing the direction of their movements. The new coordinates (x_i, y_i) of each player are updated based on these angles (with clamping to keep the players in the room). I have designed a simple reward function for each step, where the reward is calculated as follows: reward = - (r_1 ** 2 + r_2 ** 2) / normalization_factor. Here, r_1 and r_2 r…  ( 9 min )
    Future Rewards in n-step semi-gradient TD for estimating a value function
    I am trying to understand the following algorithm from Sutton and Barton. https://preview.redd.it/cdchk04e269b1.png?width=1240&format=png&auto=webp&s=e6cd2ffa91121dbc0c1396c12c0e19eddb6b4597 I am confused about this line https://preview.redd.it/62b7zlnq269b1.png?width=404&format=png&auto=webp&s=b4acb2bb61069720aedbd3e45236278351efdea4 which unrolls to https://preview.redd.it/ct1uxe5t269b1.png?width=1178&format=png&auto=webp&s=4c33c008c40326ecf7ac99f76109605979765a32 ​ My question is, at timestep t, we don't have access to any of the future rewards. How is the summation from (t, T) computed in this case, since we don't have access to future rewards? submitted by /u/ElendirThreadripper [link] [comments]  ( 8 min )
    Libraries for Offline RL and Off-Policy Evaluation?
    Hey, so I'm pretty new to the reinforcement learning world, but not to machine/deep learning. I've done some simple one-file implementations on relatively simple RL tasks, but I'm looking to do some offline learning where I can't simulate my environment and so I also want to do off-policy evaluation. The main one-stop shop that I've seen so far is RLlib with all its offline learning algorithms and off-policy evaluation (OPE) techniques implemented and ready to go, but I've heard negative reviews about the performance overhead (I don't mind the learning curve and I have access to a large cluster). I've also seen d3rlpy and tianshou, but they don't really have OPE implementations - so I would be open to using them if anyone has any recs where to find things like OPE. Anyone have any suggestions where to go? This is out of my comfort zone with deep learning, so I figure it would be better to reach out here to everyone with so much more experience. submitted by /u/EchoMyGecko [link] [comments]  ( 8 min )
    Diambra.ai: The brand-new platform for training and competing RL algorithms on retro-fighting games
    submitted by /u/DIAMBRA_AIArena [link] [comments]  ( 8 min )
    RL algorithms that establish causation through experiment?
    Are there any algorithms in RL which proceed in a way to establish causation through interventions in the environment? The interventions would proceed by carrying out experiments in which confounding variables are included and then excluded. This process of trying combinations of variables would continue until the entire collection of experiments allow for the isolation of causes. By interventions, I am roughly referring to their use in chapter §6.3 of this book https://library.oapen.org/handle/20.500.12657/26040 If this has not been formalized within RL, why hasn't it been tried? Is there some fundamental aspect of RL which is violated by doing this kind of learning? submitted by /u/moschles [link] [comments]  ( 8 min )
    Methods to keep agents inside grid world.
    Hi experts, I'd like to ask for your experience to help, with keeping the agents inside a grid world. How do you normally construct your environment so this issue doesn't happen. The structure of my environment is like: class env(): `self.x = int(); self.y = int()` `grid_world = (x,y)` `tmp = [x,y] # agent's location will be copied from "step" function before agent takes a move` `def move(action):` `# takes a 8 directional move based on an action` `def step(action):` `tmp =agent's location.copy()` `move(action)` `if agent_location[0] = self.x or agent_location[1] >= self.y:` `do the same as before. put back the tmp to the observation space and take a random move.` Now my problem is that whatever I do, the agent will leave the grid. The problem is not about rewarding it negative so it knows not to leave the grid. But in general how do you keep your agent inside the environment. I would appreciate your reply so much in advance. submitted by /u/vincent0110 [link] [comments]  ( 8 min )
    RL for finance
    I want to create a lightweight implementation utilizing Reinforcement learning (RL) for the FINQA dataset which is essentially question answering over financial documents but the computation resources required is - NVIDIA TITAN 24GB GDDR6: seems pretty high for a personal project. Are there any other papers, ways to reduce the resources such that it gets compactable with GTX 1660ti. Any leads/project ideas in the finance sector with RL would be of great help as well. Thanks submitted by /u/Priyanshu1_ [link] [comments]  ( 8 min )
    List secret ML repositories to contribute?
    I believe that open-source contributions will significantly impact candidate selection processes. Although I'm relatively new to contributing to ML models and data science, I'm eager to get involved. It would greatly benefit me if individuals could share their bookmarked projects that I could contribute to. Thank you!! submitted by /u/trafalgar28 [link] [comments]  ( 8 min )
    Update on last post; snake CNN DQN game still not learning
    This is now my updated code:my GitHub gist as per suggestions, I've implemented a replay buffer as well as copying my target network to stabilize learning however, the rewards seems to spike up and down and do not get better overall. I've also altered the CNN model so that it's more robust. After training it for about 10 hours the rewards are not improving. I was wondering if anyone would be willing and able to give me some pointers where I could go with this? prior post:prior post ​ I'm about to throw the towel and start over and abandon using CNN and just do the head relative snake game like most people submitted by /u/bleak-terminal [link] [comments]  ( 8 min )
    Jamie Simon, UC Berkeley: On theoretical principles for how neural networks learn and generalize
    submitted by /u/thejashGI [link] [comments]  ( 8 min )
  • Open

    Cayley graphs in Mathematica
    The previous post mentioned the group S4, the group of all permutations of a set with four elements. This post will show a way to visualize this group. The Mathematica command CayleyGraph[ SymmetricGroup[4], VertexLabels -> Placed["Name", Center], VertexSize -> 0.4] generates the graph below. This is an interesting image, but what does it mean? The […] Cayley graphs in Mathematica first appeared on John D. Cook.  ( 5 min )
    Permutations and centralizers in Mathematica
    I had a fleeting thought about how little I use group theory when I realize I used it just this week. A couple days ago I needed to know which permutations of 4 elements commute with reversal. If r takes a sequence and reverses it, I need to find all permutations p such that p ∘ […] Permutations and centralizers in Mathematica first appeared on John D. Cook.  ( 5 min )
    Floors and roots
    I recently stumbled across the identity while reading [1]. Here ⌊x⌋ is the floor of x, the largest integer not larger than x. My first thought was that this was hard to believe. Although the floor function is very simple, its interactions with other functions tends to be complicated. I was surprised that such a […] Floors and roots first appeared on John D. Cook.  ( 5 min )
  • Open

    Don't use AI detectors for anything important
    I've noted before that because AI detectors produce false positives, it's unethical to use them to detect cheating. Now there's a new study that shows it's even worse. Not only do AI detectors falsely flag human-written text as AI-written, the way in  ( 6 min )
    Bonus: Several ways of looking at a sandwich hole
    In an effort to make AI detectors stop classifying a paragraph from my book as AI generated, I experimented with a few different ways of getting chatGPT to reword it. Here are some of my favorites.  ( 3 min )
  • Open

    [D] Career
    Hi reddit , I'm a computer science student and I'm so confused about which branch to choose ,I'm interested in AI but this field is very wide , honestly I want to learn something that will make me money So please can you help me to choose a branch in Ai or something better and tell me briefly some jobs or ways to make money whit . submitted by /u/saf_saf_ [link] [comments]  ( 8 min )
    [R] Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture - Meta Ai Yann LeCun et al - Code + checkpoints released!
    Paper: https://arxiv.org/abs/2301.08243 Github: https://github.com/facebookresearch/ijepa Includes code + checkpoints! Abstract: This paper demonstrates an approach for learning highly semantic image representations without relying on hand-crafted data-augmentations. We introduce the Image-based Joint-Embedding Predictive Architecture (I-JEPA), a non-generative approach for self-supervised learning from images. The idea behind I-JEPA is simple: from a single context block, predict the representations of various target blocks in the same image. A core design choice to guide I-JEPA towards producing semantic representations is the masking strategy; specifically, it is crucial to (a) sample target blocks with sufficiently large scale (semantic), and to (b) use a sufficiently informative (spatially distributed) context block. Empirically, when combined with Vision Transformers, we find I-JEPA to be highly scalable. For instance, we train a ViT-Huge/14 on ImageNet using 16 A100 GPUs in under 72 hours to achieve strong downstream performance across a wide range of tasks, from linear classification to object counting and depth prediction. https://preview.redd.it/k3e7l15b579b1.jpg?width=726&format=pjpg&auto=webp&s=2bc88fad0ac10d398d95fde2069e8d2c403a2ebb https://preview.redd.it/lx3hk55b579b1.jpg?width=668&format=pjpg&auto=webp&s=1650e3e8ee17af97e3614704cd422a018bc18ff8 https://preview.redd.it/nhx0725b579b1.jpg?width=1249&format=pjpg&auto=webp&s=cfda0bf3d4ea07a2e43ce989337fe00e021b013e https://preview.redd.it/p285y55b579b1.jpg?width=522&format=pjpg&auto=webp&s=b975c23a60e8e27d7f218bc6ba6b25f8f711d3b1 https://preview.redd.it/ipa7v25b579b1.jpg?width=1483&format=pjpg&auto=webp&s=bff6c87ec22d88e5a00aebea8e160765639140fd ​ ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [P] Hybrid TSP Solver
    Hello everyone, this is my inaugural post in this community, and I'm excited to share with you my master's degree project. It's an innovative TSP (Traveling Salesman Problem) solver that blends the 1Tree branch-and-bound algorithm by Held and Karp with a Graph Convolutional Network. I've made the entire codebase available on this Github repository, along with detailed information and the good results obtained. I'm posting this to receive feedback on my work, including your thoughts on potential areas for improvement and whether you believe some of these concepts could also be applied to Concorde. submitted by /u/Lorenzos98 [link] [comments]  ( 8 min )
    [N] Training LLMs with AMD MI250 GPUs and MosaicML
    https://www.mosaicml.com/blog/amd-mi250 AMD datacenter GPUs are finally viable for ML training! Hope this will increase supply and reduce GPU prices and training costs in the long term. submitted by /u/ml_hardware [link] [comments]  ( 8 min )
    [D] How are folks daisy chaining ML models today?
    It seems to be quite common where you have different models trained on different tasks and the output of one model is the input in the next. does any tooling exist to support this? Is it a frustrating enough problem where folks would be willing to outsource and then just get an inference endpoint? Totally just fielding ideas here for climb.dev submitted by /u/vanlifecoder [link] [comments]  ( 8 min )
    [D] How do you keep up with ML news on Twitter? Suggestions for people to follow?
    A few of my friends always talk about how they find so many papers and state of the art research just from Twitter, but they work in CV whereas I work more with stats / privacy / other ML approaches (eg not much DL). I've followed people who have published papers I have liked, but many times they either are not on twitter or are highly inactive. Occasionally they'll announce a paper, but typically our interests only partially overlap so the papers I do get are only tangential to my research. Likewise, all the big / "creator" ML people mainly post about LLMs / diffusion models which also aren't relevant to my research. Is there a better way to narrow down twitter to better keep up with the state of the art in your subfield than just following people who publish interesting papers as you find them? submitted by /u/Amun-Aion [link] [comments]  ( 8 min )
    [R] Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models
    submitted by /u/gabloa [link] [comments]  ( 8 min )
    [P] Need project idea on many-particle system
    Basically the title. I'm getting started with OpenGL (not sure if its needed, since there's Freud library for generating particles) but i would like to work on particle systems and any niche machine learning use case is welcome. I have lot of time to complete, so i m open to research as well. Any other application in shaders, ray tracing, screen space or anything with pixels are welcome too. submitted by /u/Remote_Ad5597 [link] [comments]  ( 8 min )
    [D] OptiTalk
    Yo guys what do you think of this website? It's similar to character,ai. It looks much cleaner though in my opinion. https://optitalk.net/ submitted by /u/CommissarCheeseball [link] [comments]  ( 8 min )
    [P]: Generate tweets about a current topic, based on a usernames style
    I built this demo with Haystack Agents, using a tool for WebSearch to account for topics the LLM may not have knowledge on. Think the latest news etc. And a TwitterRetriever tool which is a custom Haystack node that retrieves the latest (15 in this case) tweets for the given username. ​ Result: you can ask something like: What would the twitter user dog_feelings say about the titan submarine? The demo is available on HuggingFace and the code is also hosted on GitHub: https://github.com/TuanaCelik/what-would-mother-say ​ https://preview.redd.it/k0pexdywj69b1.png?width=1324&format=png&auto=webp&s=bfa1740e5172fa900af684b5ac905f798b77f83a ​ submitted by /u/tuanacelik [link] [comments]  ( 8 min )
    Need a roadmap to create ai in teaching industry [D]
    Im thinking about creating ai in teaching. So ai marks student's work. Student will be writing on ipad or any other tablet with touch pen and ai marks it. Provide tailored questions. For example if student is wrong with certain question than ai generates similar questions to the student. Also if student is weak in math than provide easy questions, if student is strong than ai provide harder questions. Also ai explain concepts specifically where student made a mistake. Also ai provide teaching verbal explanation to students. Lastly based on student's performance, ai create report as well. Can someone give me a roadmap to creat such ai with above functions? submitted by /u/andyoh115 [link] [comments]  ( 8 min )
    [P] Secret Santa: Serve SantaCoder for code completion without exposing code IP using BlindBox
    Hi there! We have recently released an example where we use BlindBox, our open-source confidential deployment solution, to serve SantaCoder for confidential code completion. Would be happy to have your feedback! Gradio demo: https://huggingface.co/spaces/mithril-security/Santacoder-demo Blog post: https://blog.mithrilsecurity.io/ai-assisted-code-generation-with-privacy-guarantees-securely-deploy-santacoder-with-blindbox/ Docs: https://blindbox.mithrilsecurity.io/en/integration-task/docs/how-to-guides/santacoder/?ref=blog.mithrilsecurity.io GitHub: https://github.com/mithril-security/blindbox/ submitted by /u/Separate-Still3770 [link] [comments]  ( 8 min )
    [R] Image dataset for detecting defects in the manufacturing of screws
    Can anyone suggest where to find/buy a dataset of that kind? I'd like to run some experiments on a production line without starting from a generic dataset. Thank you! submitted by /u/jiiins [link] [comments]  ( 8 min )
    [D] Switch Net and Stochastic Gradient Descent
    [ Removed by Reddit on account of violating the content policy. ] submitted by /u/AI462QQQ [link] [comments]  ( 8 min )
    [D] Books for ML - different levels, any suggestions?
    Hello :) ​ TLDR, please suggest some interesting books for ML that cover introductory, intermediate, and advanced topics. ​ Our department has a surplus, so I suggested using part of this money to add new books to the department library. Here is the list of books that I gathered to add: The Elements of Statistical Learning. Second Edition. Trevor Hastie, Robert Tibshirani, and Jerome Friedman. Reinforcement learning, An Introduction. Second Edition. Richard S. Sutton and Andrew G. Barto. Deep Learning. Illustrated Edition. Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control. Second Edition. Steven L. Brunton and J. Nathan Kutz. Mathematics for Machine Learning. Deisenroth, A. Aldo Faisal, and C…  ( 9 min )
    [Discussion] How to use LDA effectively?
    I have scrapped Glassdoor Reviews. The reviews are split into two columns: Pros : What the employees liked about the company Cons : what the employees didn't like about the company After all the necessary text pre-processing from removing stopwords, punctuation, symbols, performing stemming and lemmatization and visualizing wordclouds , I thought about performing Topic Modeling algorithms for more comprehension of my data , since wordclouds are not enough. For this , according to my small knowledge , I should be testing multiple algorithms. I chose LDA, NMF , HDP & LSA. Now to find out the best performing algorithm , I chose to calculate these metrics: Coherence Score Topic Diversity Topic Coverage Topic Stability Perplexity I started with LDA , and what I did is I varied the number of topics and based on that I wanted to see this . I was wondering If I'm doing this correctly for now because I'll be doing the same for other 3 algorithms , because one thing I know that is the number of topics is my hyperparameter and in this case I have to base on different metrics to reach the most optimal result. This is what I plotted at the end of LDA: ​ ​ https://preview.redd.it/h8i2knika49b1.png?width=1589&format=png&auto=webp&s=5b8a74fc1bf1626d7174ac902c3b7e16e420bdbd submitted by /u/Miserable_Run_6377 [link] [comments]  ( 9 min )
    [P] Reinforcement learning Question Answering
    I want to create a lightweight implementation utilizing Reinforcement learning (RL) for the FINQA dataset which is essentially question answering over financial documents but the computation resources required is - NVIDIA TITAN 24GB GDDR6: seems pretty high for a personal project. Are there any other papers, ways to reduce the resources such that it gets compactable with GTX 1660ti. Any leads/project ideas in the finance sector with RL would be of great help as well. submitted by /u/Priyanshu1_ [link] [comments]  ( 8 min )
    [R] HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution
    Paper https://arxiv.org/abs/2306.15794 Blog https://hazyresearch.stanford.edu/blog/2023-06-29-hyena-dna Colab https://colab.research.google.com/drive/1wyVEQd4R3HYLTUOXEEQmp_I8aNC_aLhL?usp=sharing Abstract Genomic (DNA) sequences encode an enormous amount of information for gene regulation and protein synthesis. Similar to natural language models, researchers have proposed foundation models in genomics to learn generalizable features from unlabeled genome data that can then be fine-tuned for downstream tasks such as identifying regulatory elements. Due to the quadratic scaling of attention, previous Transformer-based genomic models have used 512 to 4k tokens as context (<0.001% of the human genome), significantly limiting the modeling of long-range interactions in DNA. In addition, t…  ( 9 min )
    [D] Anyone here actively use chatGPT, bard, copilot, etc. while coding and reviewing PRs? How do you tend to use it?
    Anyone here actively use chatGPT, bard, copilot, etc. while coding and reviewing PRs? How do you tend to use it? submitted by /u/MLtinkerer [link] [comments]  ( 8 min )
    [P] Transform Your Ideas into Reality with DemoGPT: A New Way to Prototype Using 🦜️🔗LangChain
    Hi Folks! Ever wished for a way to transform your ideas into prototypes rapidly, using simple textual prompts? Look no further! I'm thrilled to introduce DemoGPT, an innovative tool that empowers you to create demos using Large Language Models (LLMs) quickly. Here's the magic: DemoGPT leverages LangChain to generate LLM-based application code from your ideas. You can prototype and get a jump start on your app code, no matter your coding expertise level. Rapid prototyping, accessible to everyone, seamless integration with LangChain for code generation, and being open source are some of the highlights of DemoGPT. Check out this game-changer tool on GitHub and start bringing your ideas to life: https://github.com/melih-unsal/DemoGPT Can't wait to see what you create! Best, Melih submitted by /u/demogpt [link] [comments]  ( 8 min )
  • Open

    Generalizing Adversarial Robustness with Confidence-Calibrated Adversarial Training in PyTorch
    Taking adversarial training from this previous article as baseline, this article introduces a new, confidence-calibrated variant of adversarial training that addresses two significant flaws: First, trained with L∞ adversarial examples, adversarial training is not robust against L2 ones. Second, it incurs a significant increase in (clean) test error. Confidence-calibrated adversarial training addresses these problems by encouraging lower confidence on adversarial examples and subsequently rejecting them. The post Generalizing Adversarial Robustness with Confidence-Calibrated Adversarial Training in PyTorch appeared first on David Stutz.  ( 10 min )
  • Open

    Failing to train and avoid overfit on noisy training data
    I have added some Gaussian noise to CIFAR10 training and test set. I am using VGG16 and ResNet34 as the model to be used for training. Under normal training conditions, where the standard CIFAR10 is used(without any Gaussian noise), the training happens as expected. The model does not overfit when trained using the right transforms and data augmentation. I am able to achieve 95% training accuracy and 91% test accuracy. But if I add minor Gaussian noise to the entire training and test set and train the model using the same transforms, the model overfits considerably. The training accuracy reached 98% whereas the test accuracy does not go past 80%. My initial reasoning for this was that I used color jittering for the transforms during normal training(when using clean data). But since colo…  ( 9 min )
    MNIST neural network can't improve after 3000 Epochs
    Hello, I am new to neural networks and am trying to make a multilayer perceptron network. I'm not sure wither the problem is with my forward pass or backward pass. I use a learning rate of 0.02, 2 hidden layers, 30 neurons each, and a batch size of 32. With this setup it achieves an accuracy of 9.8% or about guessing. I've tried lowering the LR, changing the size of the network and other things. All the code is in java. Here is the code for the backpropagation: public Deltas BackPropagation(Prediction Out){ //δ Important Greek letter double[] δ = Mult((Subtract(Out.Outputs, Out.Expected )), ActivationPrime(Add(multiply(OutputWeights, Out.HiddenLayers[Out.HiddenLayers.length - 1]), OutputBias))); double[][] OutputWeightDeltas = new double[OutputWeights.length][OutputWeights[0].length]; …  ( 9 min )
    List secret ML repositories to contribute?
    I believe that open source contributions will significantly impact candidate selection processes. Although I'm relatively new to contributing to ML models and data science, I'm eager to get involved. It would greatly benefit me if individuals could share their bookmarked projects that I could contribute to. Thank you!! submitted by /u/trafalgar28 [link] [comments]  ( 8 min )
  • Open

    When computer vision works more like a brain, it sees more like people do
    Training artificial neural networks with data from real brains can make computer vision more robust.  ( 10 min )
    Educating national security leaders on artificial intelligence
    Experts from MIT’s School of Engineering, Schwarzman College of Computing, and Sloan Executive Education educate national security leaders in AI fundamentals.  ( 9 min )
    Researchers teach an AI to write better chart captions
    A new dataset can help scientists develop automatic systems that generate richer, more descriptive captions for online charts.  ( 10 min )
  • Open

    Democratize computer vision defect detection for manufacturing quality using no-code machine learning with Amazon SageMaker Canvas
    Cost of poor quality is top of mind for manufacturers. Quality defects increase scrap and rework costs, decrease throughput, and can impact customers and company reputation. Quality inspection on the production line is crucial for maintaining quality standards. In many cases, human visual inspection is used to assess the quality and detect defects, which can […]  ( 10 min )
  • Open

    Data assets in the AI era
    Here’s a hypothesis: Smart data (enriched with metadata that makes it connectable with other data, Tinker Toy style)  might be fungible in ways that dumb data isn’t.  For exchange and reuse purposes, can data be “fungible” at all? Consider this excerpt from an explainer on the Cointelegraph.com site: “Fungible tokens or assets are divisible and… Read More »Data assets in the AI era The post Data assets in the AI era appeared first on Data Science Central.  ( 21 min )
    The music of the Riemann Hypothesis: Sound Generation in Python
    Not long ago, I published an article entitled “The Sound that Data Makes”. The goal was turning data — random noise in this case — into music. The hope was that by “listening” to your data, you could gain a different kind of insights, not conveyed by visualizations or tabular summaries. This article is a… Read More »The music of the Riemann Hypothesis: Sound Generation in Python The post The music of the Riemann Hypothesis: Sound Generation in Python appeared first on Data Science Central.  ( 22 min )

  • Open

    Non-sentient AI future
    Watching all these AI generated ads and whatnot, I find them incredibly disturbing. Serious uncanny valley vibes from them. I keep having this horrible thought about the future. Humans no longer exist (for whatever reason, not necessarily wiped out by AI) but our world continues much as it did before, except every "person" is an AI, through machine learning over unimaginable numbers of observations of real human behavior, they are able to perfectly replicate a functioning human being in appearance and actions. You could live in this world your whole life and never know the difference. Aliens could contact the AI and never know either. Except they are not sentient. They have no self awareness or actual thinking at all. Just perfect impressions of actual humans, doing things and reacting to things based on their learned data. It's absolutely terrifying. I'm reminded of the Westworld quote " If you can't tell the difference, does it matter?" Maybe it doesn't. submitted by /u/cheezum5000 [link] [comments]  ( 8 min )
    [Give us Feedback] Recast AI: Turn your want-to-read articles into rich audio summaries.
    Hey folks, I am one of the co-founders of Recast, an AI-powered service that helps you enjoy any text article on the web in an easier, faster way. In brief, Recast turns any text you care about into a short, news-style explainer podcast using AI. Much more than a summary, our virtual hosts actually have a back-and-forth conversation, which makes all the difference. You can submit your own, or discover what other users have already recast. Recasts are shorter than reading the whole article would be, plus listening is often a much more palatable way of consuming knowledge, as you can listen to a Recast on a walk, on the commute, while doing the dishes, etc. Here’s a couple of Recasts you can listen to directly: [NYTimes] In Norway, the Electric Vehicle Future Has Already Arrived [DataDrivenInvestor] Dear Sam Altman- There was never an era of making models bigger [MIT Tech Review] China isn’t waiting to set down rules on generative AI [The Egg And The Rock] In cosmology, all our errors lean the same way. The implications are... interesting We'd love to get your feedback! submitted by /u/ekurutepe [link] [comments]  ( 8 min )
    The face of someone having the perfect poo according to AI
    submitted by /u/just_quit_smoking [link] [comments]  ( 8 min )
    How long is the musiclm waitlist?
    i went on the waitlist on 14th of may and i've still been waiting, the idea is so good and i want to have fun with it someone i know went on the waitlist YESTERDAY and already has access, so i dont know whats happening any explanation? submitted by /u/Ctoousiwcasasi [link] [comments]  ( 8 min )
    If Elon Musk and Mark actually fight, this is probably how the conversation would go.
    submitted by /u/3nd4u [link] [comments]  ( 8 min )
    My boss has told me to read a 200 page annual report and identify every sentence that has a "numerical or empirical statement" in it and highlight. I have to think this would be easy to do with AI but am hitting a wall. Any tools that you would recommend I check out?
    I thinks he's looking for statements ranging from very obvious (revenue rose 10%) to less clear (we opened a new facility during the fiscal year) if that makes any difference. Thank you for your help, I'm facing like 3 days of reading right now lol. submitted by /u/Dr_Work_Cr_Soul [link] [comments]  ( 8 min )
    Advice on AI Needed
    Hi everyone! Although I am not a programmer of anysorts, I've been wanting to start a venture that relies on the use of A.I. I need to make a bot that can read pictures and use its database to categorize/ diagnose what it is seeing. I've made some contacts and have vague backing for the data so far, but right now having the capability to have the bot is my upmost priority. I've never really owned or created any machine learning and I don't know where to start and if I even have the capabilities. submitted by /u/XxxMeowMeowPurrxxX [link] [comments]  ( 8 min )
    AI Buildings Series [workflow in comments]
    submitted by /u/PeePeePeePooPooPooo [link] [comments]  ( 8 min )
    AI to review, critique and advise for design files?
    Hi there, im wondering if you people know if theres an Ai that i can give a graphic with some context (e.g. "this will be used as a printed business card in the size x * y") and it can give me hints and pointers of what it believes could be done better? Thanks in advance submitted by /u/meerjungfrauman [link] [comments]  ( 8 min )
    Have you tried any AI chat language learning tools? What did you think?
    I just tried a language learning tool called "gopenpal.ai" where you can chat with an AI in your target language. It has built in translations and you can click on words to see their definitions. It also corrects your writing. I liked that before you enter the chat you can choose the level of difficulty you want the conversation to be in (from A1 to C2). I thought it was pretty good but could do with some more features like links to online dictionaries for each word you click on (like you get on LingQ). Also, as beginner Italian learner, I don't know how correct the AI's messages are and the corrections it offers. Anyone here tried similar sites? What did you think? submitted by /u/krampster2 [link] [comments]  ( 8 min )
    Jason Bateman in American Psycho
    Used Roop And Elevenlabs and edited in Premiere Pro submitted by /u/AthleteEducational63 [link] [comments]  ( 8 min )
    Is there an AI where I can give it lyrics and tell it what style song I want and have it produce a music file?
    I'm working on a project for school which I'm creating a fake band webpage for and I really wanted to give it that extra oomph by adding some AI generated music. It needs to be metal music sung by female vocalists. submitted by /u/VirtualAdeptGirl [link] [comments]  ( 8 min )
    Sexually explicit open sourced language models? Do they exist? (asking for a friend of friend)
    Hey yall. Wondering if anyone knows of any models that exist that are at par with gpt3.5 (or close to) Purely for research purposes. submitted by /u/chriscarmy [link] [comments]  ( 8 min )
    Video AI Learning path ?
    Hi, I'm a complete newbie to this. Can someone please share the learning paths / documents / tutorials etc to train something related to videos? Ex. feeding good looking sky images > identifying the sky of a given video > replace with a good sky or similar tasks. submitted by /u/sdw23 [link] [comments]  ( 8 min )
    One-Minute Daily AI News 6/28/2023
    AlphaGo from Google’s DeepMind AI lab made history by defeating a champion player of the board game Go. Now Demis Hassabis, DeepMind’s cofounder and CEO, says his engineers are using techniques from AlphaGo to make an AI system dubbed Gemini that will be more capable than that behind OpenAI’s ChatGPT.[1] As anyone who’s seen depictions of AI in movies like 2001: A Space Odyssey and Alien will know, you simply don’t put your life control system in the hands of a sentient computer. Now, though, NASA is seemingly going against everything Hollywood has taught us about AI space assistants by developing a system that will allow astronauts to use a natural-language ChatGPT-like interface in space.[2] A team of researchers, including professors from the University of Montana and UM Western, have found that OpenAI’s GPT-4 scored in the top 1% on the Torrance Tests of Creative Thinking (TTCT), matching or outperforming humans in the creative abilities of fluency, flexibility, and originality.[3] Shares of U.S. chipmakers fell on Wednesday following reports that the Biden administration was planning new curbs on the export of computing chips for artificial intelligence to China as early as July.[4] Sources: [1] https://www.wired.com/story/google-deepmind-demis-hassabis-chatgpt/ [2] https://interestingengineering.com/innovation/nasa-chatgpt-ai-lunar-gateway [3] https://nbcmontana.com/news/local/um-um-western-researchers-find-openais-gpt-4-outperforms-humans-in-creativity-tests [4] https://www.reuters.com/markets/europe/chip-stocks-drop-us-mulls-fresh-curbs-ai-access-china-2023-06-28/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    a Voice changer worth investing in?
    I am working on a cartoon show for my brand's youtube channel, and now that I have designed all the characters and have written a few episodes (about 6), I am having a tough time with choosing the voices of each character. I have been going over multiple voice apps like voice and voicemods...etc. To be honest, I am not very impressed. People have been hyping them up, but they sound very fake and broken and they skip words and just not the best quality. But they all suck on the freemium version. I am going to have to purchase the premium version and I am okay with that. I was just wondering if anyone has purchased any of the voice generation apps and if they can recommend which is the top one that's making the most strides forward. submitted by /u/itsokmydadisrich [link] [comments]  ( 8 min )
    Has anyone heard any updates on UBI reasearch
    So a long while back I heard of OpenAI researching UBI. But I haven't heard ANYTHING about what they are finding, if they just asked GPT3 what we ask and that's the "research" or anything else. And I'm not hearing much anything on any other research on UBI. ​ Has anyone heard any updates on this with OpenAI? What about other places? submitted by /u/crua9 [link] [comments]  ( 8 min )
  • Open

    "Monte-Carlo Planning in Large POMDPs", Silver & Veness 2010
    submitted by /u/gwern [link] [comments]  ( 8 min )
    Can I make a customized robot in OpenAI gym?
    I am running a code project based on OpenAI gym. However, the project initially uses ant robots, which make it less convinced for later research. I want to replace ant robots with some more realistic models, for example, a turtlebot or clearpath robot. Does anyone know if it is possible? If yes, where should I start? Should I build a model from scratch, or there exists ones I could use directly? Any suggestion is appreciated! submitted by /u/Iton608 [link] [comments]  ( 8 min )
    How much do people care about Gym/gymnasium environment compatibility?
    I've written my own multiagent grid world environment in C with a nice real-time visualiser (with openGL) and am thinking of publishing it as a library. The step function call works basically exactly the same as in Gym. But that's basically where the similarities end. Due to the way I implemented it will probably be a pain to get it fully compatible with Gym. Do people really care that much about Gym compatibility? submitted by /u/askii717 [link] [comments]  ( 8 min )
    Invariance of episode length - rewards
    First, I use Q-Learning based method (DDQN). So, I have a task that can take up to M steps, and I have a proxy value function that is actually my reward (it was trained before MY value function). After every K timestamps, I get a reward according to the proxy value function on the relevant state. It sounds like a strange method, but it helps with sample efficiency, and the task is very challenging to learn from scratch. Now, the issue is - I can either pick rewards that are strictly positive, negative, or mixed. When I use mixed rewards, I observe weird behaviors such as doing the task well first and then starting to perform badly due to negative expected rewards in the future, probably (it's better to die now, because it's going to be bad next). If I use a positive reward, things seem to be more stable. However, I think the agent would tend to prolong episodes. I thought about dividing the returns by the trajectory (episode) length - is it the problematic approach? Are there any papers you are aware of which discuss this issue? Going greedy does not help me - I see that 0.99-1 discount factors work better for my use case (probably due to some bias & representation and lack of long term planning, I don't know, it's all black magic - but states can look similar to the network, rewards will still be very different towards the end of the task). TLDR: I want the agent to be invariant to episode length, how should I shape my rewards? Also, I understand this question is very env specific - but I mainly would appreciate references to papers related to this issue - I am not looking for "a solution" LOL. submitted by /u/pyepyepie [link] [comments]  ( 9 min )
    Is there an implementation of non-deep RL algorithms based on Stable Baselines3?
    Hi, I'm currently working on satisficing RL, which can be roughly described as obtaining a reward of 10 instead of maximizing the reward. For my current approach, I have built upon the Stable Baselines3 (SB3) Deep Q-learning algorithm. However, I occasionally encounter some unexpected results. To determine whether these issues are related to deep learning or our satisficing framework, I would like to test satisficing using classical Q-learning. Has anyone already utilized the SB3 framework to implement Q-learning? Alternatively, do you know of a good Q-learning implementation that is compatible with Gymnasium/Gym discrete environments ? Thank you. submitted by /u/Butanium_ [link] [comments]  ( 8 min )
    MADDPG: reward drops fast after updating ac network
    I'm using MADDPG to do computation offloading. But the reward drops fast once I start sampling from the replay buffer. ​ https://preview.redd.it/ukk82xkeav8b1.png?width=801&format=png&auto=webp&s=8ca2cc8a68dd4d9048d02ba58df30df0afbe42d8 ``` class critic(abstract_agent): ​ def __init__(self, obs_shape, act_shape): super().__init__() self.LRelu = nn.LeakyReLU(0.01) self.linear_c1 = nn.Linear(act_shape + obs_shape, 64) self.linear_c2 = nn.Linear(64, 64) self.linear_c = nn.Linear(64, 1) ​ def forward(self, obs_input, act_input): x_cat = self.LRelu(self.linear_c1(torch.cat([obs_input, act_input], dim=1))) x = self.LRelu(self.linear_c2(x_cat)) x = self.linear_c(x) ​ return x ​ class actor(abstract_agent): ​ def __init__(self, num_input, action_size): super().__init__() s…  ( 9 min )
  • Open

    [D] Toy Datasets and Models to experiment with Knowledge Distillation
    I am planning to conduct research on knowledge distillation, specifically exploring certain mathematical characteristics of the technique. Consequently, I will need to conduct numerous experiments. Considering my constraint of a 30-hour weekly limit on Kaggle, I am aiming to minimize training time. Could you kindly recommend suitable models and datasets for my research? submitted by /u/MazenAmria [link] [comments]  ( 8 min )
    [R] Adversarial Robust Deep Reinforcement Learning Requires Redefining Robustness
    https://ojs.aaai.org/index.php/AAAI/article/view/26009/25781 submitted by /u/ml_dnn [link] [comments]  ( 8 min )
    [D] Without the hype: How do current state-of-the-art LLMs benefit society?
    This is in contrast to obvious harms of current LLMs like uses for information warfare and election interference and „far-out“ benefits of potentially more powerful future models. submitted by /u/optimized-adam [link] [comments]  ( 8 min )
    [R] Training Transformers with 4-bit Integers - Haocheng Xi et al Tsinghua University - 2.2 times faster than the FP16 counterparts and speeds up the training by up to 35.1%!
    Paper: https://arxiv.org/abs/2306.11987 Github: https://github.com/xijiu9/Train_Transformers_with_INT4 Abstract: Quantizing the activation, weight, and gradient to 4-bit is promising to accelerate neural network training. However, existing 4-bit training methods require custom numerical formats which are not supported by contemporary hardware. In this work, we propose a training method for transformers with all matrix multiplications implemented with the INT4 arithmetic. Training with an ultra-low INT4 precision is challenging. To achieve this, we carefully analyze the specific structures of activation and gradients in transformers to propose dedicated quantizers for them. For forward propagation, we identify the challenge of outliers and propose a Hadamard quantizer to suppress the outliers. For backpropagation, we leverage the structural sparsity of gradients by proposing bit splitting and leverage score sampling techniques to quantize gradients accurately. Our algorithm achieves competitive accuracy on a wide range of tasks including natural language understanding, machine translation, and image classification. Unlike previous 4-bit training methods, our algorithm can be implemented on the current generation of GPUs. Our prototypical linear operator implementation is up to 2.2 times faster than the FP16 counterparts and speeds up the training by up to 35.1%. https://preview.redd.it/jqe4ickl809b1.jpg?width=1348&format=pjpg&auto=webp&s=64ab9fd77b8fbd929db65da5c89668dd1039d5c8 https://preview.redd.it/hvldoekl809b1.jpg?width=1641&format=pjpg&auto=webp&s=879a96da6c7b5654af2d899013515a27c3d1736c https://preview.redd.it/jvu6bekl809b1.jpg?width=1514&format=pjpg&auto=webp&s=02393a7d7fc91ce35dfea16b741d8ccb79bcf0fe ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [P] MultiEL: Multilingual Entity Linking model by BELA model
    Hey everyone, I was create simple python package to use BELA model for create Entity Linking in 98 languages!!! It use Bi-encoder Entity Linking Architecture (BELA) from Meta. The model is MIT license!!! GitHub: https://github.com/wannaphong/MultiEL Demo: https://github.com/wannaphong/MultiEL/blob/main/notebook/test.ipynb submitted by /u/wannaphong [link] [comments]  ( 8 min )
    [Project] Prototype - (Auto) Codebase to Video for a given perspective. Test case: Pybads and Project Management
    Video: https://youtu.be/jNwRqcrz6rg A video created by an app built on top of the BabyDragon package(still in dev) - it is still in the very early stages of development and the pipeline is currently a bit janky but I would love to hear any feedback/suggestions even if it is just nitpicking. Thank you. Source Code for test repo: https://github.com/acerbilab/pybads submitted by /u/NovelspaceOnly [link] [comments]  ( 8 min )
    [N] Beware of Unreliable Data in Model Evaluation: Case Study w/ Flan T5 LLM
    Hello Redditors! LLMs have undoubtedly established themselves as the vanguards of natural language processing, consistently pushing the limits of language comprehension and generation, thus solidifying their prominent position in the field. It's pretty common now for data scientists and ML engineers to validate the quality of their training data being fed into these LLMs, but what about their test data used to evaluate them? I spent some time playing around with the FLAN-T5 open-source LLM from Google Research and I discovered that noisy test/evaluation data can actually cause you to choose sub-optimal prompts. Using the observed accuracy, you would select Prompt A as better. Prompt B is actually the better prompt when evaluated on the clean test set. Given two prompts A and B, I found multiple cases where prompt A performed better on the observed (noisy) test data, yet worse on the high-quality test data. In reality, this means that you would choose A as the "best prompt" when prompt B is actually the better one. I also proved the accuracy difference to be significant via McNemar’s Test. I wrote up a quick article that explains my methodology and how I used data-centric AI to automatically clean the noisy test data in order to ensure optimal prompt selection. Let me know what you think! submitted by /u/cmauck10 [link] [comments]  ( 9 min )
    [D] Cluster Algorithm - Successively Approach
    A company offers 5 types of services (j = 5), where S = [1, ..., j]. To carry out the different services, the company has 5 types of equipment (i = 5), where E = [1, ..., i]. I would like to use the table presented below to perform a cluster analysis. However, I would like to perform a cluster analysis successively. First cluster the equipment according to the services offered, after clustering the services according to the equipment used, and repeat the process until it converges into a common cluster result. Thus, the ultimate goal is to obtain a cluster that is a compromise between the two approaches described previously. Is there any clustering algorithm that is capable of accomplishing this task? submitted by /u/Virtual-Hall8707 [link] [comments]  ( 8 min )
    [P] AI image generation without copyright infringement
    Yesterday, news broke of Microsoft and ChatGPT being sued over their unconstrained data scraping to train ChatGPT (https://www.theregister.com/2023/06/28/microsoft_openai_sued_privacy/). In this context, I want to highlight the work we're doing to build a billion-size free-to-use Creative Commons image dataset which can be used to train generative AI models like Stable Diffusion. While we are still working on the dataset, you can already read about our approach here: https://blog.ml6.eu/ai-image-generation-without-copyright-infringement-a9901b64541c We are currently scaling up our solution using Fondant (https://github.com/ml6team/fondant) and will open-source both the pipeline and resulting dataset in the near future. Any feedback would already be highly appreciated. submitted by /u/RobbeSneyders [link] [comments]  ( 8 min )
    [R] Deep Neural Networks for Rank-Consistent Ordinal Regression Based On Conditional Probabilities
    submitted by /u/seraschka [link] [comments]  ( 8 min )
    [R] SAITS: Self-Attention-based Imputation for Time Series. Expert Systems with Applications, 219:119619, 2023.
    Missing values in time series collected from the real world are common to see and very pesky. A new state-of-the-art and fast neural network called "SAITS“ is proposed to help impute missing data in partially-observed multivariate time series. The paper has been peer-reviewed and published in the journal Expert Systems with Applications (DOI link). The full paper is available on arXiv at this URL. The code is open source on GitHub https://github.com/WenjieDu/SAITS/. 👀 NOTE: If your research lies in time-series modeling, you may also be interested in the work PyPOTS: a Python toolbox for data mining on Partially-Observed Time Series. Its full paper is available on arXiv as well https://arxiv.org/abs/2305.18811, which has been peer-reviewed and accepted by the 9th SIGKDD international workshop Mining and Learning from Time Series (MiLeTS'23). submitted by /u/WenjayDu [link] [comments]  ( 8 min )
    [D] Prep for Anthropic interview?
    Hi all, currently prepping for a research engineer interview at Anthropic, and I was wondering if there's any people here who have interviewed before who can speak to what that process was like. Haven't found too much online so far. What ML topics / study resources are most important for me to prep? (For context, I've actually almost never interviewed for ML roles before, but have done tons of quant interviews, so stats/data science/talking about my current ML research are the potential questions I'm pretty comfortable with.) How crucial is EA to Anthropic still? I wouldn't necessarily call myself an effective altruist so would that be a ding? submitted by /u/riterbi [link] [comments]  ( 8 min )
    [D] First job opinion?
    Hi guys! I recently got an intership offer and I would like to know your opinions about the job. The company is one of the biggest telecommunications companies of Europe and they want me to work on a project which consists in the integration of gpt-3 in their customer service chatbot. I am excited about this opportunity but my main goal is learning as much as I can in my intership and I dont know if working on that project is worth it. What do you think about it? submitted by /u/dani_blz [link] [comments]  ( 8 min )
    [R] Advice on my thesis project
    Hi everyone, I am a final year student in the master's program in matter physics and I am doing my thesis project, which is to apply ML methods for optimization on different measurements of perovskite-based cells with a fixed structure, varying only the layer of the 2D passivant, which helps to improve the efficiency of the cell. Specifically, I collected data on 9 different cations, each with 3 different concentrations x 2 different annealing temperatures, and for each combination I have 3 devices on which I perform 16 measurements. Having obtained the full database of measurements I filtered it (roughly 2500 measurements remained) in order to remove the measurements on bad pins (it happens often with perovskites) and basically trained 6 models (linear regress, k-nearest neighbor, random …  ( 10 min )
  • Open

    Announcing the first Machine Unlearning Challenge
    Posted by Fabian Pedregosa and Eleni Triantafillou, Research Scientists, Google Deep learning has recently driven tremendous progress in a wide array of applications, ranging from realistic image generation and impressive retrieval systems to language models that can hold human-like conversations. While this progress is very exciting, the widespread use of deep neural network models requires caution: as guided by Google’s AI Principles, we seek to develop AI technologies responsibly by understanding and mitigating potential risks, such as the propagation and amplification of unfair biases and protecting user privacy. Fully erasing the influence of the data requested to be deleted is challenging since, aside from simply deleting it from databases where it’s stored, it also requires …  ( 93 min )
    On-device diffusion plugins for conditioned text-to-image generation
    Posted by Yang Zhao and Tingbo Hou, Software Engineers, Core ML In recent years, diffusion models have shown great success in text-to-image generation, achieving high image quality, improved inference performance, and expanding our creative inspiration. Nevertheless, it is still challenging to efficiently control the generation, especially with conditions that are difficult to describe with text. Today, we announce MediaPipe diffusion plugins, which enable controllable text-to-image generation to be run on-device. Expanding upon our prior work on GPU inference for on-device large generative models, we introduce new low-cost solutions for controllable text-to-image generation that can be plugged into existing diffusion models and their Low-Rank Adaptation (LoRA) variants. Text-t…  ( 92 min )
  • Open

    Building Boba AI: Some lessons and patterns learnt in building an LLM-powered generative application
    submitted by /u/nickb [link] [comments]  ( 8 min )
    Keras
    Are there any general courses/tutorials like Kaggle courses for Python and Pandas that can give a general overview of Keras? Book recommendations are also fine. submitted by /u/bifrost44 [link] [comments]  ( 8 min )
    OpenOrca, an open-source dataset and series of instruct-tuned LLMs
    submitted by /u/nickb [link] [comments]  ( 8 min )
  • Open

    Recommend and dynamically filter items based on user context in Amazon Personalize
    Organizations are continuously investing time and effort in developing intelligent recommendation solutions to serve customized and relevant content to their users. The goals can be many: transform the user experience, generate meaningful interaction, and drive content consumption. Some of these solutions use common machine learning (ML) models built on historical interaction patterns, user demographic attributes, […]  ( 10 min )
    Interactively fine-tune Falcon-40B and other LLMs on Amazon SageMaker Studio notebooks using QLoRA
    Fine-tuning large language models (LLMs) allows you to adjust open-source foundational models to achieve improved performance on your domain-specific tasks. In this post, we discuss the advantages of using Amazon SageMaker notebooks to fine-tune state-of-the-art open-source models. We utilize Hugging Face’s parameter-efficient fine-tuning (PEFT) library and quantization techniques through bitsandbytes to support interactive fine-tuning of […]  ( 9 min )
  • Open

    Breaking cross-modal boundaries in multimodal AI: Introducing CoDi, composable diffusion for any-to-any generation
    Imagine an AI model that can seamlessly generate high-quality content across text, images, video, and audio, all at once. Such a model would more accurately capture the multimodal nature of the world and human comprehension, seamlessly consolidate information from a wide range of sources, and enable strong immersion in human-AI interactions. This could transform the […] The post Breaking cross-modal boundaries in multimodal AI: Introducing CoDi, composable diffusion for any-to-any generation appeared first on Microsoft Research.  ( 10 min )
  • Open

    What Is Robotics Simulation?
    Robots are moving goods in warehouses, packaging foods and helping assemble vehicles  — when they’re not flipping burgers or serving lattes. How did they get so skilled so fast? Robotics simulation. Making leaps in progress, it’s transforming industries all around us. Robotics Simulation Summarized A robotics simulator places a virtual robot in virtual environments to Read article >  ( 11 min )
    Calm, Cool and Creative: MUE Studio Showcases 3D Scenes ‘In the NVIDIA Studio’
    MUE Studio, founded by 3D artists Minjin Kang and Mijoo Kim, specializes in art direction, photography and 3D design for campaigns and installations.  ( 7 min )
    ‘Remnant II’ Headlines 14 Games Joining GeForce NOW in July
    It’s a jam-packed July with 14 newly supported titles in the GeForce NOW library, including Remnant II from Gunfire Games and Gearbox Publishing. Need a new adventure? Check out the nine additions streaming from the cloud this week. Plus, the Steam Summer Sale kicks off this week, and many supported titles in the GeForce NOW Read article >  ( 5 min )
  • Open

    Aligning IT infrastructure with business objectives
    Introduction   In the modern business landscape, aligning IT infrastructure with business objectives is desirable and indispensable. At a time when data is being dubbed the ‘new oil,’ managing it has become critical to achieving strategic business objectives. But how can businesses leverage their IT assets to serve their goals better? This is where the alignment… Read More »Aligning IT infrastructure with business objectives  The post Aligning IT infrastructure with business objectives  appeared first on Data Science Central.  ( 22 min )
    Will coding be a collaborative experience using GitHub Copilot? – Part two
    In the first part of this blog, we discussed how coding could be a collaborative experience using tools like the GitHub Copilot. In the second part, we will explore the impact and significance of collaboration due to GitHub Copilot on the wider developer ecosystem. As we have seen, developers have a specific definition of collaboration… Read More »Will coding be a collaborative experience using GitHub Copilot? – Part two The post Will coding be a collaborative experience using GitHub Copilot? – Part two appeared first on Data Science Central.  ( 21 min )
    Navigating the future of learning: AI, certification, and higher education
    The world of higher education is undergoing a transformative shift as artificial intelligence (AI) continues to reshape various aspects of our society. From classrooms to career development, the integration of AI and its impact on learning is undeniable. In this article, we will explore the intersection of AI, certification, and higher education, and delve into… Read More »Navigating the future of learning: AI, certification, and higher education The post Navigating the future of learning: AI, certification, and higher education appeared first on Data Science Central.  ( 27 min )
  • Open

    Multiplication via parabola
    Here’s a graphical way to multiply two positive numbers a and b using the parabola y = x². Start at the origin, move a units to the left, then go up vertically to the parabola, and draw a point. Go back to the origin, move b units to the right, go up vertically to the […] Multiplication via parabola first appeared on John D. Cook.  ( 5 min )
  • Open

    Generating 3D Molecular Conformers via Equivariant Coarse-Graining and Aggregated Attention
    --> Figure 1: CoarsenConf architecture. (I) The encoder $q_\phi(z| X, \mathcal{R})$ takes the fine-grained (FG) ground truth conformer $X$, RDKit approximate conformer $\mathcal{R}$ , and coarse-grained (CG) conformer $\mathcal{C}$ as inputs (derived from $X$ and a predefined CG strategy), and outputs a variable-length equivariant CG representation via equivariant message passing and point convolutions. (II) Equivariant MLPs are applied to learn the mean and log variance of both the posterior and prior distributions. (III) The posterior (training) or prior (inference) is sampled and fed into the Channel Selection module, where an attention layer is used to learn the optimal pathway from CG to FG structure. (IV) Given the FG latent vector and the RDKit approximation, the decoder $p_\theta…  ( 5 min )
  • Open

    Insights from global conversations
    We are sharing what we learned from our conversations across 22 countries, and how we will be incorporating those insights moving forward.  ( 4 min )

  • Open

    [D] Anyone have experience swapping from a different industry by getting an online degree?
    Hey guys, Couple details about me: I have a B.S. in EE from an accredited and respected state school. I've worked in process automation for 2 years. The programming I do is much more like playing with really advanced and customizable Legos and digital logic in STL than I am developing software from scratch. I do have programming experience in C++ and python; however, it is recreational. Currently finishing up that popular Stanford Online Machine Learning Specialization on Coursera with Andrew Ng. The lectures and labs are easy to follow and understand. I've come to the conclusion that if I truly want to swap industries to ML, then I would most likely need an M.S. which there are plenty of online programs being offered from prestigious schools; however, I'm hesitant as the only reason I got my current job is through an internship in my undergraduate with a competitor. An online program would make that much more difficult if I had to guess. Has anyone had any luck scoring an entry level job from an online degree alone? submitted by /u/AbsolutelyN0thing [link] [comments]  ( 8 min )
    [Discussion] Is there a better way than positional encodings in self attention?
    From my understanding of Attention is all you need, each entity in a self attention network runs a dot product of its key vector with the query vector of every other entity. Therefore for an entity to know its position relative to the position of other entities, the input entities (i.e. the word embeddings) need to be augmented with positional encodings. This is done by adding sine and cosine values of the word's position in the input stream. Question: is there no better way to get positional awareness? Adding sine and cosine values to input embeddings seems wasteful/inefficient. submitted by /u/taxed_2_death [link] [comments]  ( 8 min )
    [D] How to get internship in AI/ML domain? Please help.
    Please help. I'm looking for internships in AI/ML domain I'm a third year cs student in university. I'm working in the field of AI/ML. I have experience with Deep Learning, Computer Vision, OpenCv, Mediapipe, ChatBot development, App development. And I have also made several projects in these fields. Most of my friends have internships whereas I'm struggling to get one. Please tell me what kind of projects or courses should I do to land something. Any startups or organizations here looking for interns in AI/ML domain. I know that I'm a hardworking and passionate person who can fit it the startup. I can also do unpaid work if I get guidance and a chance to learn more. Incase there are any please let me know and can send in my resume for review. Thankyou for the help:) submitted by /u/astrid_x_ [link] [comments]  ( 8 min )
    How do I use the inception_v3 model and tensorflow for image classification? [D]
    I used to write my own models for this one project I'm doing but the results werent great so I want to switch to some premade model but I dont know how to train it on my own images. submitted by /u/FriendshipThis1234 [link] [comments]  ( 8 min )
    [D] How would you approach this scenario? (model and implementation)
    Hello Fellow friends, INTRODUCTION I am an economist transitioning into Marketing areas, my area of expertise is creating and validating models with really reduced datasets of time series or Data panel data, for goverments. And now I want to incursion more into 'Big data' analysis. So i started a side project with an app with ads. PROBLEM I have an app and webside as a side project, and I want to optimize my conversion rate on the app. Currently: I collect approximately 20 values per user (like ip, geo, device, product desired, etc.) have about 10 possible "AD company" each with 10 "themes" from which i want to show a specific user the best "AD company" and "theme" to optimize the conversion based on his specific characteristics. From the AD companies I get info for every conversion (about 10 more values) My traffic flow is low, of only about 50k visiting my webside and only get a few thousand checking the ad, from where i only get about 30 conversions. WHAT IM TAKING IN CONSIDERATION 1) Performance of the Ad companies will vary over the time drastically (e.g. if they don't have a good ad for a good user on a specific month) 2) I want to get the best option for each user and not just sort the performance of each option based on overall performance, so I think A/B testing is not appropiate 3) I want an approach so the less performant options can be checked randomly and constantly (because their performance can increase randomly). 4) 95%+ Users are one time users (this is a niche where recurring users are rare) ​ WHAT I THINK COULD BE DONE 1) I don't have any expertise on the AI area, but i think there must be models or approaches that can solve this since i don't really need the models to be validated , and rather I want to keep them contantly evolving. ------------------------------------------------------------------------------------- With all that said: How would you approach this problem? ​ ​ submitted by /u/yaodownload [link] [comments]  ( 9 min )
    [D] What role curates datasets at a small company?
    Sorry if this isn’t the correct sub for this question. I work for a small startup that is not AI focused, but we have about 8 different models that are used in various products. I am the lead developer on these projects, and have created UI’s for building, reviewing, and training datasets for these models. But adding additional data and training new models can still be time consuming. When I bring in other developers on the team to help with these tasks they have the attitude this is kind of work is beneath them, because it is tedious and there is little to no coding involved. So my question is how do small companies usually handle these tasks? Is it a dedicated role or usually outsourced as data entry type work? submitted by /u/Anthony780 [link] [comments]  ( 8 min )
    [P] Unable to explicitly reproduce the best model trained by KerasTuner even with the same seed
    Hello everyone, I am trying to train a simple neural network model by performing a hyperparameter random search using KerasTuner. I used the KerasTuner to perform 5 trials to search for the best set. Below are the code and the results. EPOCHS = 30 random_search_tuner = kt.RandomSearch( build_model, objective="val_accuracy", max_trials=5, overwrite=True, directory="my_project", project_name="my_rnd_search", seed=7) random_search_tuner.search(X_train, Y_train, epochs=EPOCHS, validation_data=(X_val, Y_val)) Output: Trial 5 Complete [00h 04m 23s] val_accuracy: 0.8891666531562805 Best val_accuracy So Far: 0.8891666531562805 Total elapsed time: 00h 19m 03s Then, I took the best set to train the model myself from scratch. The issue here is I am unable to reproduce the results of the model…  ( 9 min )
    [R] Is it good to say 'we will also include it in the camera-ready version?'
    I am writing a rebuttal for a machine learning conference and answering questions asked by a reviewer. I have answered it in the rebuttal. What do people in here think of adding 'we will include it (the answer) to the camera-ready version if accepted' to the rebuttal document? How do you think meta reviewers will react? submitted by /u/whereismycatyo [link] [comments]  ( 8 min )
    SALUT LES AMIS [D]
    Hope you can help me or US the beginners in this field You know how is hard to self study something ,there is a lot of ressources out there ,bunch of courses ,etc... Especially when it comes to a subject like ML wich include math,... So, it would be helpful to give a roadmap ,all the essentiel elements to study,some technique And merci. submitted by /u/Playful-Diamond926 [link] [comments]  ( 8 min )
    [P] Bullet Points – news briefs written by AI, built with open-source LLMs
    Hi all -- I'm working on a project called Bullet Points, an AI-assistant to help you stay briefed on any topic in the news. I'd love to hear feedback from this community, in a comment here or a note to [contact@bulletpoints.news](mailto:contact@bulletpoints.news). Thanks for trying it out! The key motivation is that we all subscribe to human-curated email newsletters on various subjects. What if you could have an AI brief you on any topic in the news and get those briefs sent to your inbox? That's Bullet Points in a nutshell. My target users are professionals, wonks of all sorts, and infovores. You give Bullet Points a topic, and it shows you the top stories related to that topic, with AI-generated summaries and links to the underlying sources to dig further. If you like what you see, you can subscribe to daily or weekly briefs for that topic, delivered to your inbox. Whether you're a professional keeping up with industry news, a student researching a particular subject, or simply an informed citizen, Bullet Points is designed to keep you in the loop efficiently. I've implemented this project with open-source LLMs, running on CPUs. I made this choice to minimize costs, and because I wanted to learn how to work directly with LLMs. I'm using the following models: Instructor for embeddings (https://huggingface.co/hkunlp/instructor-large) BART, fine-tuned on the CNN Daily Mail dataset, for summarization (https://huggingface.co/facebook/bart-large-cnn) FLAN T-5 for curation (https://huggingface.co/google/flan-t5-base) And I'm using pgvector for vector search. Bullet Points scans thousands of news sources, aiming to cover a broad range of topics, regardless of your location in the world or your industry. However, for now, Bullet Points is focused on English-language news. submitted by /u/cpwalker100 [link] [comments]  ( 9 min )
    [P] PyPOTS: a Python toolbox for data mining on Partially-Observed Time Series
    Due to all kinds of reasons like failures of collection sensors, communication errors, and unexpected malfunctions, missing values are common to see in time series from the real-world environment. No matter whether we like them or not, missing data makes partially-observed time series (POTS) a pervasive problem in open-world modeling and prevents advanced data analysis. Although this problem is important, the area of data mining on POTS still lacks a dedicated toolkit. PyPOTS is created to fill in this gap. PyPOTS (pronounced "Pie Pots") is the first (and so far the only) Python toolbox/library specifically designed for data mining and machine learning on partially-observed time series (POTS), namely, incomplete time series with missing values, A.K.A. irregularly-sampled time series, supporting tasks of imputation, classification, clustering, and forecasting on POTS datasets. It is born to become a handy toolbox that is going to make data mining on POTS easy rather than tedious, to help engineers and researchers focus more on the core problems in their hands rather than on how to deal with the missing parts in their data. PyPOTS will keep integrating classical and the latest state-of-the-art data mining algorithms for partially-observed multivariate time series. For sure, besides various algorithms, PyPOTS has unified APIs together with detailed documentation and interactive examples across algorithms as tutorials. Feedback, questions, and contributions are all very welcome! Website: https://pypots.com GitHub repo: https://github.com/WenjieDu/PyPOTS Paper link: https://arxiv.org/abs/2305.18811 Tutorials: https://github.com/WenjieDu/BrewPOTS Docs: https://docs.pypots.com submitted by /u/WenjayDu [link] [comments]  ( 9 min )
    [D] Call for Presentations
    ** Self Promotion - This is my event ** The AI Conference 2023 is being held in SF in September. Our CFP is open for 3 more days. Submit a talk via our Call For Presenters here Here are a few of our confirmed speakers for this event: - Nazneen Rajani, Research Lead, Hugging Face - Peter Norvig, Director of Research, Google - Michele Catasta, VP of AI, Replit - Jerry Liu, Co-founder, LlamaIndex - Harrison Chase, Co-founder, Langchain - Hagay Lupesko, VP Engineering, MosaicML ** If you're interested in getting involved as a volunteer or need a registration scholarship ping me here. submitted by /u/shonburton [link] [comments]  ( 8 min )
    [D] What do you think of Mistral.ai's value proposition and open source strategy?
    Mistral.ai, a 4 week old startup founded by star researchers from DeepMind and FAIR recently raised a whopping €105m in their seed round just 4 weeks after set up. Their leaked pitch "deck" made several arguments that I find quite interesting and debatable (not that I think they are totally wrong) given recent progress in research and open source. They seem to believe that a moat can be built around better models, or at least the engineering know-how required to pursue such endeavours That closed sourced models served behind an API poses integration challenges and safety issues that will become a bottleneck for adoption That open sourcing the majority of their work, puts them in a position to win over the very finite research talent necessary to keep them ahead of competition Open source models and tools accelerate the development of ecosystems and aid in dominance through standard setting. (personally I think this is their strongest argument). Very similar to what Yann LeCun cited as reasons for FAIR's commitment to open sourcing. Curious what the industry practitioners and academics here think about this? https://sifted.eu/articles/pitch-deck-mistral submitted by /u/gamerx88 [link] [comments]  ( 8 min )
    [D] Semantic Search on Resumes?
    I've been trying to implement semantic search on resumes using OpenAI embeddings. So far I have tried directly inputting the contents of the resume to the text-ada-002 model, inputting contents in JSON format to the model, and generating a natural language summary (I theorized this would work better than JSON-based embeddings). Even though the last one works decently, it struggles with queries like "x years of work experience" and stuff like that, even though I include it in the generated natural language summary. What could be some steps to improve upon this? submitted by /u/DarthLoki79 [link] [comments]  ( 8 min )
    [D] Company fired me and then put my work on arxiv without credit. What can/should I do?
    I worked for a company where I led a research project exploring a novel ML use case that was successful. I made most of the early, foundational discoveries. They wanted me to keep working on it (along a couple others on our team) and write a paper. Well, after this original burst of success I started spiraling into some serious depression and just wasn't able to get anything done. They gave me quite a bit of time to recover and I felt they treated me fairly at the time. After a long enough time of no output from me they understandably let me go. Recently they released the paper on arxiv. I assume they submitted it somewhere for peer review as well. There was a lot of new stuff on top of my original work — better metrics, iterative methods contributions, etc. — but the core methods contrib…  ( 9 min )
    [R] Regularization-free Diffeomorphic Temporal Alignment Nets [ICML 2023]
    Regularization-free Diffeomorphic Temporal Alignment Nets [ICML 2023]. https://i.redd.it/9390g0dkhr8b1.gif TL;DR: Semi-supervised framework for time series joint alignment and averaging without requiring a regularization hyperparameter (i.e., SoftDTW gamma, DTAN CPA-prior, etc.,) **Abstract:**In time-series analysis, nonlinear temporal misalignment is a major problem that forestalls even simple averaging. An effective learning-based solution for this problem is the Diffeomorphic Temporal Alignment Net (DTAN), that, by relying on a diffeomorphic temporal transformer net and the amortization of the joint-alignment task, eliminates drawbacks of traditional alignment methods. Unfortunately, existing DTAN formulations crucially depend on a regularization term whose optimal hyperparameters are dataset-specific and usually searched via a large number of experiments. Here we propose a regularization-free DTAN that obviates the need to perform such an expensive, and often impractical, search. Concretely, we propose a new well-behaved loss that we call the Inverse Consistency Averaging Error (ICAE), as well as a related new triplet loss. Extensive experiments on 128 UCR datasets show that the proposed method outperforms contemporary methods despite not using a regularization. Moreover, ICAE also gives rise to the first DTAN that supports variable-length signals. Paper: https://openreview.net/pdf?id=7IbLWa0anECode: https://github.com/BGU-CS-VIL/RF-DTAN edit: fixed figure/gif. submitted by /u/ronshap [link] [comments]  ( 8 min )
    [D] The Open-Source AI imperative: Key takeaways from Hugging Face CEO's testimony to the US Congress
    Full article: https://www.giskard.ai/knowledge/the-open-source-ai-imperative-key-takeaways-from-hugging-face-ceos-testimony-to-the-us-congress submitted by /u/Giskard_AI [link] [comments]  ( 8 min )
    [R] Seeking Comprehensive Tutorials on "Under-the-Hood" Machine Learning Concepts
    Hello everyone, I'm on a quest to deepen my understanding of machine learning, beyond the scope of traditional classroom teachings. I have found that there's so much more to learn than what is typically covered in course material. Concepts like how memory allocation works between the CPU and the GPU, or intricacies of PyTorch are not usually delved into, but I believe they are pivotal to a holistic understanding of the field. These "between the lines" elements often seem to be what separates a decent ML practitioner from a great one. I'd love to dive deep into these areas and really get my hands dirty, but I'm having trouble finding structured, comprehensive resources to do so. So, I'm reaching out to this knowledgeable community in the hopes that you might share your favorite resources that delve into these topics. YouTube tutorials, blogs, online courses, or even textbooks - I'm open to it all. Anything that's helped you get a better grasp on the nitty-gritty aspects of ML. submitted by /u/Ascn11 [link] [comments]  ( 9 min )
    [P] I'm working on AI audio vector embeddings project: search 100M+ tracks on Apple Music with natural language prompts and similarity search
    Been working on this for the past few months, would love some feedback! Would you be interested to use more music discovery tools? SpeakMusic: https://speakmusic.sonophase.com Model architecture + training We’ve trained an AI model to understand the meaning inb music. The model combines a machine listening for audio signal processing with transformers for language embeddings. With self-supervision, we train a general purpose audio embedding and sentence embedding. Then, we fine-tune these embeddings to learn audio-language correspondence. The machine listening system combines a CNN spectrogram patch extractor and graph neural network (GNN) to produce framewise embeddings distributed in time The language encoder uses BERT embeddings + a transformer encoder Embeddings Once trained, we index a huge catalogue of unseen audio into audio vector embeddings, ensuring that the search system can efficiently scale to millions of tracks. At the moment, our model is optimised for our preferred music; electronic, techno, ambient, dub and relaxing etc. We’re currently fine-tuning to handle all kinds of genres and moods. Search types Our model enables two types of discovery: (a) natural language search and (b) similarity search. (a) Speak: search for music using freeform natural language prompts. Describe the mood, aesthetic, texture, setting and context of a track. (b) Music: discover tracks that are acoustically similar to ANY reference track from Apple Music. submitted by /u/Arialuminai [link] [comments]  ( 9 min )
    [D] Is there an efficient way to make inferences with open-source LLM?
    Hi everyone, I want to discuss with you something that has been bothering me lately. I am interested in using open-source LLM models on my infrastructure, but it seems too expensive to implement. For instance, I came across the MPT-30b model, which is extremely powerful and even has a 4-bit quantization that can run on a CPU. However, when I tried running it on an AWS ml.m5.2xlarge instance with 32 GB of RAM and 8 vCPUs (which cost around US$ 0.46 per hour), it took a lot of time to make a single inference (around 2 min). This makes it almost impossible to implement such a model at a production scale. Has anyone else experienced similar issues with running LLM models on their infrastructure? What solutions have you found to be effective? Is there a more cost-effective way to run these models without sacrificing performance? I believe this is a crucial issue that needs to be addressed, as it can limit the accessibility and adoption of open-source LLM models. I am eager to hear your thoughts and experiences on this topic. Thank you for your time and input. submitted by /u/paulo_zip [link] [comments]  ( 9 min )
    LLM Powered Autonomous Agents: An exploration of the current landscape of the latest in Automomous AI Agents - blogpost by OpenAI Fmr. Chief Researcher and Current Head of AI Saftey Lilian Weng [D]
    👋🏿 Hey AI enthusiasts, I just came across an intriguing blog post by Lilian Weng on the concept of 'agent' in the realm of AI. She dives into the intricacies of how an agent perceives its environment and makes decisions based on its observations. One of the key takeaways from her post is the distinction between model-based and model-free reinforcement learning. Model-based RL constructs an internal model of the environment and uses it for planning, while model-free RL directly learns the policy or value function from interactions with the environment. She also discusses the trade-offs between exploration and exploitation, a classic dilemma in reinforcement learning. The agent needs to balance between exploiting known rewards and exploring the environment for potentially higher rewards. I found the part about the 'credit assignment problem' particularly interesting. It's about how an agent attributes the reward (or punishment) it receives to its previous actions. This is a significant challenge in reinforcement learning, especially in environments with delayed rewards. Here's the link to the blog post. What are your thoughts on this? 🤔 submitted by /u/banuk_sickness_eater [link] [comments]  ( 9 min )
    Changes of Embeddings during Fine-Tuning of Transformers [R]
    Hey everyone, we have published an article featuring visualizations of embeddings and outliers during the fine-tuning process of Transformers. The article includes animations that highlight these aspects. Check it out on our Medium publication at https://medium.com/@markus.stoll/changes-of-embeddings-during-fine-tuning-c22aa1615921. Additionally, we would like to share that the results are also available on Hugging Face Spaces. You can explore them at https://huggingface.co/spaces/renumics/cifar10-outlier. To create these Spaces, we leveraged our open-source visualization tool Spotlight, which you can find at https://github.com/Renumics/spotlight. Projection of embeddings with PCA and Outliers during fine-tuning of a Vision Transformer (ViT) model on the CIFAR10 Dataset. submitted by /u/DocBrownMS [link] [comments]  ( 8 min )
    [R] Want something better than k-means? Try BanditPAM (github.com/motiwari)
    Want something better than k-means? I'm happy to announce our SOTA k-medoids algorithm from NeurIPS 2020, BanditPAM, is now publicly available! pip install banditpam or install.packages("banditpam") and you're good to go! k-means is one of the most widely-used algorithms to cluster data. However, it has several limitations: a) it requires the use of L2 distance for efficient clustering, which also b) restricts the data you're clustering to be vectors, and c) doesn't require the means to be datapoints in the dataset. Unlike in k-means, the k-medoids problem requires cluster centers to be actual datapoints, which permits greater interpretability of your cluster centers. k-medoids also works better with arbitrary distance metrics, so your clustering can be more robust to outliers if you're …  ( 9 min )
    So long r/MachineLearning, it's been an interesting few years
    Some of you may recognize me, most of you probably don't. I've been the most active moderator of r/MachineLearning for a few years now, but on June 30th I'll be deleting my Reddit account. I pretty much exclusively used Apollo to moderate. It would notify me of any new post, which allowed me to moderate from anywhere, anytime. That's how I stayed on top of moderating such a large sub. When I stepped back on my moderation efforts a few months ago, the effects were quite apparent to many of you. Of course, this is the internet, and each of you have your own subjective view on moderation. Just know that it is a very time consuming task that I did for free because I genuinely cared about the community. If you want to join me, I'll be moving on to kbin where I'm a moderator for m/machinelearning. Otherwise, this is my farewell. P.S. I'm sure there will be some who are sympathetic and some who just have an axe to grind and will complain about anything. I'm not a piñata; there's no prize inside if you bash me, but if you just can't help yourself, then have at it. I'll be gone soon anyway. submitted by /u/dojoteef [link] [comments]  ( 9 min )
  • Open

    Looking For Feedback On AI Stories App
    Hi I am looking for a few people to provide feedback on the quality of the stories that are on a storytelling app that I made. Would anyone be interested in helping out? submitted by /u/shantanu_roy_4141 [link] [comments]  ( 8 min )
    Are hallucinations what separates GPT from a Google search?
    submitted by /u/travelingladybug23 [link] [comments]  ( 8 min )
    The rise of AI phone scams
    submitted by /u/thisisinsider [link] [comments]  ( 8 min )
    Musk vs Zuckerberg. The Fight of the Century !!!
    submitted by /u/Akumetsu_971 [link] [comments]  ( 8 min )
    Dr. April Khademi | Artificial Intelligence in Medicine | Biomedical Eng...
    submitted by /u/Last_Salad_5080 [link] [comments]  ( 8 min )
    AI Green Bacteria Interior Series [workflow in comments]
    submitted by /u/PeePeePeePooPooPooo [link] [comments]  ( 8 min )
    Ready to Sing Elvis Karaoke … as Elvis? The Weird Rise of AI Music
    submitted by /u/actualjournalist [link] [comments]  ( 8 min )
    How efficient could an AI get at completing a task?
    I was watching a video and heard a question "how would a video game character try to escape out of the game?" Which got me thinking if you assigned an AI to complete a video game, eventually it should be really good at completing it and play optimally. But would it be possible for an AI to get so efficient, it understands and can manipulate/glitch the game to complete it even faster? If so, how could that be applied to something outside of video games? Just random questions I was thinking about that I would love to hear answers about. Thank you! submitted by /u/RyokuSashimi [link] [comments]  ( 8 min )
    AI/ML/DL/LLM etc. Best Online Resources
    Hi, I work in the IT industry (Cloud and DevOps sector) and I have an accademic background in Artificial Intelligence, Machine Learning, Deep Learning, and LLM. I feel the need to refresh my knowledge and update myself with the latest advancements. I've heard good things about platforms like Coursera and some online university courses, but I'm wondering if anyone with a similar background can recommend specific courses or resources that have been particularly helpful in their learning journey. I'm looking for resources that provide a comprehensive and up-to-date understanding of these fields. Any suggestions or recommendations would be greatly appreciated! Thank you in advance. submitted by /u/thaneonreddit [link] [comments]  ( 8 min )
    Snowflake and NVIDIA partnering for their enterprise customers + Mozilla is releasing an AI chatbot + NVIDIA excelling on ML benchmarks
    submitted by /u/nocodebcn [link] [comments]  ( 8 min )
    How long until AI can play a game like Red Dead Redemption 2?
    I'd like to able to sit back and watch AI play a game like this, with maybe some interaction through voice commands, Is this a bit far fetched? submitted by /u/ozzyneil [link] [comments]  ( 8 min )
    LLM Powered Autonomous Agents: An exploration of the current landscape of the latest in Automomous AI Agents - blogpost by OpenAI Fmr. Chief Researcher and Current Head of AI Saftey Lilian Weng
    submitted by /u/banuk_sickness_eater [link] [comments]  ( 8 min )
    One-Minute Daily AI News 6/27/2023
    Baidu, the leading search engine provider in China, announced that its latest version of the Ernie AI model, Ernie 3.5, has surpassed the widely popular OpenAI chatbot, ChatGPT, in multiple key metrics. Baidu made this statement on Tuesday, highlighting Ernie 3.5’s superior comprehensive ability scores compared to ChatGPT and its outperformance in several Chinese capabilities compared to GPT-4.[1] GitHub has revealed that generative AI-based developer tools such as GitHub Copilot will lead to a $1.5 trillion boost to global GDP by 2030. The announcement comes as GitHub Copilot gets activated by more than a million developers and adopted by more than 20,000 organizations.[2] Pedophiles are using AI technology to create and sell life-like child sexual abuse material, the BBC has found. Some are accessing the images by paying subscriptions to accounts on mainstream content-sharing sites such as Patreon. Patreon said it had a “zero tolerance” policy about such imagery on its site. The National Police Chief’s Council said it was “outrageous” that some platforms were making “huge profits” but not taking “moral responsibility”.[3] A team of researchers from the University of Texas MD Anderson Cancer Center has developed a new artificial intelligence (AI) system that can reduce the time of cancer radiotherapy by half. The system, called RadioPlan AI, uses AI to automatically plan radiotherapy treatments, which can save patients time and reduce the risk of side effects.[4] Sources: [1] https://www.businesstoday.in/technology/news/story/baidus-ernie-ai-model-reportedly-outperforms-openais-chatgpt-on-multiple-metrics-387344-2023-06-28 [2] https://www.neowin.net/news/github-says-ai-developer-tools-could-lead-to-a-15-trillion-global-gdp-boost/ [3] https://www.bbc.com/news/uk-65932372 [4] https://wacotrib.com/news/science/new-ai-reduces-cancer-radiotherapy-time-by-half/video_fad20537-608d-5716-9f5d-4a12a52c8f9a.html submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    How do you do the AI songs with famous voices?
    I’m curious to know how this is done and I’d love to mess around with it, but I have no idea how/where to start. Is it done online by uploading to servers on a particular site? Can it be done offline with software installed locally? If someone could help me figure it out that would be great! submitted by /u/DeezA123 [link] [comments]  ( 8 min )
    Best AI Voice for Audio Books - Play.ht?
    I'm a fan of old pulp novels but I'm so busy these days that audiobooks are where I do most of my "reading" these days so I was looking for ways to have AI read me public domain novels. ​ Right now I'm working on turning Tros of Samothrace by Talbut Mundy into an ai audiobook (http://gutenberg.net.au/ebooks05/0500901.txt) and I tried ElevenLabs and honestly I found play.ht to be the superior reader for long-form audiobooks, at least if you use their new experimental voices of which I find the British "Clark" to be the best. But play.ht will cost me about $100 to make three to four novels into audiobooks. I just wanted to see if anyone has any other recommendations for making audiobooks with AI voices? ​ submitted by /u/jrralls [link] [comments]  ( 8 min )
    New AI Products from Unity: Unity Muse and Unity Sentis
    submitted by /u/wyem [link] [comments]  ( 8 min )
    The Race to Regulate Artificial Intelligence: Why Europe Has an Edge Over America and China
    submitted by /u/polandballbounces [link] [comments]  ( 8 min )
  • Open

    How decentralized apps can help businesses improve data security and privacy
    In today’s digital landscape, data privacy, and security have become the most critical concerns for businesses across industries. With the ever-evolving threat of data breaches, unauthorized access, and privacy violation, companies are increasingly seeking innovative ways to protect their digital assets and sensitive information.  One such solution that helps businesses significantly safeguard their crucial information… Read More »How decentralized apps can help businesses improve data security and privacy The post How decentralized apps can help businesses improve data security and privacy appeared first on Data Science Central.  ( 20 min )
    Data transformation 101: Process and new technologies
    The benefits, types, and processes of data transformation and how it contributes to data management, integration, and new technologies. The post Data transformation 101: Process and new technologies appeared first on Data Science Central.  ( 22 min )
    Harnessing the power of weather data: A guide to actionable insights
    In the era of big data and AI, harnessing weather data to predict, plan, and optimize various industries has become an indispensable practice. Today, we will delve into the fascinating process of turning this voluminous weather data into actionable insights. By combining cutting-edge technology, analytical models, and industrial applications, we’ll explore how weather data can… Read More »Harnessing the power of weather data: A guide to actionable insights The post Harnessing the power of weather data: A guide to actionable insights appeared first on Data Science Central.  ( 22 min )
    Smart waste management solutions that are revolutionizing the industry
    The start of ‘smartification’ is merging itself into the various prospects of sustainability that entail smart waste management solutions. With the speedy availability and discovery of smart waste management technologies, waste gathering is also developing into a greener and much more efficient economic system. Prominent tech implementations include robots, sensor technology, and many more. Local authorities are also… Read More »Smart waste management solutions that are revolutionizing the industry The post Smart waste management solutions that are revolutionizing the industry appeared first on Data Science Central.  ( 20 min )
  • Open

    Computer vision system marries image recognition and generation
    MAGE merges the two key tasks of image generation and recognition, typically trained separately, into a single system.  ( 8 min )
    Gamifying medical data labeling to advance AI
    MIT alumnus’ platform taps the wisdom of crowds to label medical data for AI companies.  ( 10 min )
  • Open

    Capture public health insights more quickly with no-code machine learning using Amazon SageMaker Canvas
    Public health organizations have a wealth of data about different types of diseases, health trends, and risk factors. Their staff has long used statistical models and regression analyses to make important decisions such as targeting populations with the highest risk factors for a disease with therapeutics, or forecasting the progression of concerning outbreaks. When public […]  ( 8 min )
    Safe image generation and diffusion models with Amazon AI content moderation services
    Generative AI technology is improving rapidly, and it’s now possible to generate text and images based on text input. Stable Diffusion is a text-to-image model that empowers you to create photorealistic applications. You can easily generate images from text using Stable Diffusion models through Amazon SageMaker JumpStart. The following are examples of input texts and […]  ( 10 min )
  • Open

    Beta inequalities and cross ratios
    When I worked at MD Anderson Cancer Center, we spent a lot of compute cycles evaluating the function g(a, b, c, d), defined as the probability that a sample from a beta(a, b) random variable is larger than a sample from a beta(c, d) random variable. This function was often in the inner loop of […] Beta inequalities and cross ratios first appeared on John D. Cook.  ( 6 min )
  • Open

    Supervised Pretraining Can Learn In-Context Reinforcement Learning
    "...we introduce and study Decision-Pretrained Transformer (DPT), a supervised pretraining method where the transformer predicts an optimal action given a query state and an in-context dataset of interactions, across a diverse set of tasks. This procedure, while simple, produces a model with several surprising capabilities." submitted by /u/ItsJustMeJerk [link] [comments]  ( 8 min )
    PettingZoo -> Gym Wrapper
    I'm trying to make a wrapper to go from current PettingZoo to Gym (a little counter intuitive I know) as I want to train a MARL environment from PettingZoo using algorithms in Gym. I was wondering if anyone has any experience doing something similar or working with wrappers that could help? As I understand one of the biggest differences would be in that gym does not deal with multiple agents so the wrapper would have to essentially flatten the observation/action spaces from PettingZoo in order for them to work in Gym? I guess if anyone has any useful resources/experience in this that would be helpful I'd be super keen to hear from you, thanks! submitted by /u/Happy-Experience-252 [link] [comments]  ( 8 min )
    2D Boid flocking environment
    Hello, I am new to RL and wanted any guidance to a library such as Open Ai gym, that would have RL algorithms available, for modification and or use. I want to control 2D boids and having a prebuilt environment would be a Godsend. Any ideas would be appreciated as I am stuck because making an environment from scratch is beyond my expertise. submitted by /u/Youravgboi101 [link] [comments]  ( 8 min )
  • Open

    Watch This Space: New Field of Spatial Finance Uses AI to Estimate Risk, Monitor Assets, Analyze Claims
    When making financial decisions, it’s important to look at the big picture — say, one taken from a drone, satellite or AI-powered sensor. The emerging field of spatial finance harnesses AI insights from remote sensors and aerial imagery to help banks, insurers, investment firms and businesses analyze risks and opportunities, enable new services and products, Read article >  ( 7 min )
    Who’ll Stop the Rain? Scientists Call for Climate Collaboration
    A trio of top scientists is helping lead one of the most ambitious efforts in the history of computing — building a digital twin of Earth. Peter Bauer, Bjorn Stevens and Francisco “Paco” Doblas-Reyes agree that a digital twin of Earth needs to support resolutions down to a kilometer so a growing set of users Read article >  ( 7 min )
    Field to Fork: Startup Serves Food Industry an AI Smorgasbord
    It worked like magic. Computer vision algorithms running in a data center saw that a disease was about to infect a distant wheat field in India. Sixteen days later, workers in the field found the first evidence of the outbreak. It was the kind of wizardry people like Vinay Indraganti call digital transformation. He’s practiced Read article >  ( 6 min )
    Matice Founder and Harvard Professor, Jessica Whited on Harnessing Regenerative Species – and AI – for Medical Breakthroughs
    Scientists at Matice Biosciences are using AI to study the regeneration of tissues in animals known as super-regenerators, such as salamanders and planarians. The goal of the research is to develop new treatments that will help humans heal from injuries without scarring. On the latest episode of NVIDIA’s AI Podcast, host Noah Kravtiz spoke with Read article >  ( 5 min )
  • Open

    Introducing OpenAI London
    We are excited to announce OpenAI’s first international expansion with a new office in London, United Kingdom.  ( 1 min )

  • Open

    Ai Memory
    I’ve been working on a fantasy story for a few years, and a while ago, I decided to use ChatGPT to organize my thoughts, by telling it the story through a series of long messages, asking for feedback on each of them. This worked fine for the most part, but after a certain point, I noticed that it was asking about things that were already answered. I tried asking it a couple of very simple questions, such as the names of important characters, and it just started making things up. So evidently, it completely loses its memory of certain information that it is told after a certain amount of messages. I was wondering if anyone knows of any Ai that is as “smart” as chatgpt, but won’t forget things after a few messages. submitted by /u/chickennoodlebeast [link] [comments]  ( 8 min )
    Me and Chat GPT every day.
    submitted by /u/katiecharm [link] [comments]  ( 8 min )
    Me as a child: oh boy I wonder what AI will be like in the 2020s. The AI in the 2020s:
    submitted by /u/katiecharm [link] [comments]  ( 8 min )
    Machine Learning Reveals Hidden History of Runaway Slaves
    submitted by /u/EricFromOuterSpace [link] [comments]  ( 8 min )
    My most ambitious system to date - Auratura: Realtime Audioreactive Poem & Recite Generator - [TouchDesigner + ChatGPT + ElevenLabs]
    submitted by /u/Chuka444 [link] [comments]  ( 8 min )
    AI Barbie Oasis [workflow in comments]
    submitted by /u/PeePeePeePooPooPooo [link] [comments]  ( 8 min )
    Hope, fear, and AI
    submitted by /u/rckatz2007 [link] [comments]  ( 8 min )
    GA to create arbitrarily complex algorithms?
    I'm very interested in techniques to develop algorithms. I'm currently exploring genetic algorithms. From what I understand, they are usually implanted with fixed genome size, say 40 bits. And crucial to the genetic algorithm working is the fitness function being able to assess partial fitness. I'm struggling on a couple of aspects: Say I create algorithms that generate a variable length genome that translates to an arbitrary algorithm of an individual. The individual might consist of variable assignments, computations, loops and so on, of some arbitrary depth and size. I'd have GA mechanisms to implement crossovers, mutations and selections. I'd expect this to involve representing an individual's algorithm/phenotype as a tree, for xample a while loop at the base of the tree, and branch…  ( 10 min )
    GPT4 is 8 x 220B params = 1.7T params
    For a while we’ve been were hearing rumors GPT-4 is a trillion parameter model. Well in the last week some insiders have shed light on this. It appear the model is actually a Mixture of Experts (MoE), where each of the eight experts has 220B params, totaling 1.7T parameters. Interestingly, MoE models have been around for some time. So what is a MoE? Most likely, the same data set was used to train all eight experts. Even though no human specifically allocated different topics, each expert could have developed a unique proficiency in various subjects. This is a little bit of simplification, since currently the way the experts specialize in tasks is pretty alien to us. It’s likely there’s a lot of overlap in expertise. The final output isn't merely the superior output from one of the ei…  ( 9 min )
    Any other good AI music generators or is it just MusicLM
    I think I’ve tried a few browser based ones but they weren’t as tuned or in-depth as MusicLM. submitted by /u/Maelasae [link] [comments]  ( 8 min )
    Is there any text based AI that allows me to input really long code?
    For example- profile editing for Tumblr, Gaia Online, long detailed BBCode submitted by /u/Maelasae [link] [comments]  ( 8 min )
    No help from snapchat ai
    submitted by /u/Zcopey [link] [comments]  ( 8 min )
    More People Are Going Blind. AI Can Help Fight It
    submitted by /u/ChubbyBrunch [link] [comments]  ( 8 min )
    Wicked
    submitted by /u/Double-Natural-4696 [link] [comments]  ( 8 min )
    Suggestions for AI voices
    I am designing a game and would need a lot of voices (with expressions). Please PLEASE suggest some good AI tools for the same. I tried and am familiar with murf.ai even it's premium model but could not get the expressions in voice. Is there anything better? submitted by /u/ninkasi97 [link] [comments]  ( 8 min )
    AI for news digestion?
    Hey, I am looking for an AI tool that is capable of summarizing news for me. I fancy this functionality: I ask it to monitor a set of (rss) news feeds for me. And I can ask it anytime something like „give me a short summary of the relevant news about subject X published in the last x days!“ than it would start to give me a summary and I could ask further questions based on the articles. (Like the tool would write or say something like „an article about the strategy of the Reddit management in the Financial Times showed the difficulties of the company to deal with AI.“ me: „hey, that’s interesting! Can you tell me more about this?“ tool: „sure! …“) Is there anything like this? submitted by /u/jfzu [link] [comments]  ( 8 min )
    Wait, AI that hates AI ?
    What if sometime in the future, for sure there would be people and maybe groups that will strictly condone AI and/or AGI and they start to take measures to strip out AI from the world, And its surely feels like this would happen or maybe is happening already. So they just build an AI that has only one purpose totally disrupting AI and its functions. It will keep on upgrading itself and destroy any other AI in any way possible. So, I feel like if it were to happen It will really be scary and all. What's the possibility of this happening? Technically is it possible? submitted by /u/Melodic-Macaroon-230 [link] [comments]  ( 8 min )
    One-Minute Daily AI News 6/26/2023
    Despite Tesla CEO Elon Musk‘s strong criticism of the downside of AI, Tesla’s AI team recently announced on Twitter the progress of Tesla’s custom supercomputer platform Dojo, indicating that the computer will go into production in July 2023 and that Dojo is expected to be one of the world’s five most advanced supercomputers by early 2024.[1] Microsoft researchers introduced a new system called ZeRO++ has been developed to optimize the training of large AI models, addressing the challenges of high data transfer overhead and limited bandwidth. ZeRO++ builds upon the existing ZeRO optimizations and offers enhanced communication strategies to improve training efficiency and reduce training time and cost.[2] Mizuho Financial Group, Japan’s second-largest bank, is rolling out generative AI to…  ( 9 min )
    What's the best tool for creating an image in a certain style?
    Hi there. Thanks for reading. I would like to make a series of images in a specific style, and I have put together many examples of that style (19th century ink illustrations). Is there a tool that's best for this? How have you done this? submitted by /u/helpwitheating [link] [comments]  ( 8 min )
    turning a image ( 2d art) into a 3d modle
    is there a free website or a app where you take a picture and turn it into a 3d modle for free and that it looks good ​ submitted by /u/number014 [link] [comments]  ( 8 min )
    Elvis sings "Baby Got Back"
    submitted by /u/SoyOrbison87 [link] [comments]  ( 8 min )
  • Open

    [D] Technical Interview in Machine Learning position
    I have passed the code task and I’ll have a technical interview for about 1 hour in the Machine Learning Engineer (Computer Vision in satellite imagery) position, It's a junior position and the company size is about 51-200 employees. Strong programming experience in Python. This position focused on deep learning projects and here are their requirements: Object-Oriented Design Patterns and Principles. Strong theoretical and practical background in the application of Deep Learning methodologies on Image Segmentation, Object Detection, and Generative Modeling tasks.Solid coding experience in Deep Learning frameworks (e.g. TensorFlow, PyTorch).Strong mathematical aptitude to understand and apply advanced concepts in the scientific literature. What kind of questions should I expect, and how to prepare for it? I have one week left. submitted by /u/NailaBaghir [link] [comments]  ( 8 min )
    [D] Only calculate loss function based on portion of outputs?
    Say I have a Bert-like encoder model and want to perform NER. The context window is 512, I have a large document >> 512 tokens, and I want the information from the surrounding context to predict each token, so am worried the predicting the tokens on the edges of the context window will not be very good. Can I shrink the range of tokens over which my loss function is calculated so that I only really predict tokens in the middle of the context window, but still have information from the surrounding tokens (all 512)? Has anyone tried this or heard or a similar technique? submitted by /u/BlockPretty5695 [link] [comments]  ( 8 min )
    [R] Computational Asymmetries in Robust Classification
    submitted by /u/smarro [link] [comments]  ( 8 min )
    [D] Using CNN in Causal Inference
    Hi, I am new to Causal Inference, and recently I am trying to understand how machine learning could be used in the field. I have been looking around and I don't seem to see works using CNN as the function approximator in the SCM formulation of causal model i.e. the f(X, noise) part, wouldn't that be useful for image data? Please point out if my logic is flawed or this question doesn't make sense at all, thanks! submitted by /u/Living-Dust-2905 [link] [comments]  ( 8 min )
    [R] Evaluating Superhuman Models with Consistency Checks
    submitted by /u/dpaleka [link] [comments]  ( 8 min )
    [D] Legal and Copyright on redistribution of existing datasets contents.
    I wish to release the existing datasets contents with additional annotations. We recently created a dataset with multiple sources videos (from ActivityNet, TVCaptions and other public datasets). I wish to release an integrated file that include the contents from different datasets. According my legal consultation from lawyer friends in United States, the feedback is it's legal since the copyrights belongs to original video creators (e.g. the users who upload the videos to Youtube and the TV producers who produce the TV dramas). As long as the release of original dataset is legal, the redistribution should be allowed in most of the cases. For additional safeguard, we may need to ask our users to accept the exact same agreement as the original ones (Activitynet's and TVCaption's). And all of the usage (ours and our users) should strictly be kept within open-source research and non-commercial. I am here to consult more researchers to see if my above understanding is right and is there any previous example of doing so. I can see many datasets with additional annotations would still request users to download from original sources. But some of them are already deprecated and it's hard to download from official website. submitted by /u/luodianup [link] [comments]  ( 9 min )
  • Open

    Unifying image-caption and image-classification datasets with prefix conditioning
    Posted by Kuniaki Saito, Student Researcher, Cloud AI Team, and Kihyuk Sohn, Research Scientist, Perception Team Pre-training visual language (VL) models on web-scale image-caption datasets has recently emerged as a powerful alternative to traditional pre-training on image classification data. Image-caption datasets are considered to be more “open-domain” because they contain broader scene types and vocabulary words, which result in models with strong performance in few- and zero-shot recognition tasks. However, images with fine-grained class descriptions can be rare, and the class distribution can be imbalanced since image-caption datasets do not go through manual curation. By contrast, large-scale classification datasets, such as ImageNet, are often curated and can thus provide fine-…  ( 91 min )
  • Open

    Use proprietary foundation models from Amazon SageMaker JumpStart in Amazon SageMaker Studio
    Amazon SageMaker JumpStart is a machine learning (ML) hub that can help you accelerate your ML journey. With SageMaker JumpStart, you can discover and deploy publicly available and proprietary foundation models to dedicated Amazon SageMaker instances for your generative AI applications. SageMaker JumpStart allows you to deploy foundation models from a network isolated environment, and […]  ( 10 min )
    How Earth.com and Provectus implemented their MLOps Infrastructure with Amazon SageMaker
    This blog post is co-written with Marat Adayev and Dmitrii Evstiukhin from Provectus. When machine learning (ML) models are deployed into production and employed to drive business decisions, the challenge often lies in the operation and management of multiple models. Machine Learning Operations (MLOps) provides the technical solution to this issue, assisting organizations in managing, […]  ( 9 min )
  • Open

    DSC Weekly 27 June 2023
    Announcements Top Stories In-Depth The post DSC Weekly 27 June 2023 appeared first on Data Science Central.  ( 21 min )
    DSC Webinar Series – AutoML 2.0 – Control what you want, automate the rest
    New low-code approaches to machine learning pioneered by Uber, Apple, and Meta Building ML solutions from scratch is a challenge: complex low-level code and long dev cycles make it hard to deploy a single model in less than 6 mos. On the other hand, existing commercial AutoML solutions lack flexibility and support for unstructured data,… Read More »DSC Webinar Series – AutoML 2.0 – Control what you want, automate the rest The post DSC Webinar Series – AutoML 2.0 – Control what you want, automate the rest appeared first on Data Science Central.  ( 17 min )
    Human and AI translations: How to skillfully combine them
    In today’s fast-paced world, technology is constantly pushing the boundaries of content translation. The question of whether to rely on AI or human translation services has sparked a lively debate. However, rather than viewing it as an “either-or” situation, a more effective approach is to combine the strengths of both. By leveraging the power of… Read More »Human and AI translations: How to skillfully combine them The post Human and AI translations: How to skillfully combine them appeared first on Data Science Central.  ( 22 min )
    The benefits of automated data capture
    Introduction  In this era of digitization, enterprises receive and send many documents daily. One area that often poses challenges to enterprises is the extraction of unstructured data from various documents, such as invoices and purchase orders. Over 80% of data nowadays is unstructured, and unstructured data is projected to grow in 2023 and beyond.   However, this process… Read More »The benefits of automated data capture   The post The benefits of automated data capture   appeared first on Data Science Central.  ( 20 min )
    Cosmetic product recognition system for product categorization using AI & ML
    Welcome to the shining world of beauty and wellness. This is where makeup artists, skincare devotees, and beauty enthusiasts come together to find the right potion to enhance their beauty. There is, however, a comical conundrum hidden amongst the sea of cosmetic products – the constant struggle to categorize them all! Let’s explore the mysteries… Read More »Cosmetic product recognition system for product categorization using AI & ML The post Cosmetic product recognition system for product categorization using AI & ML appeared first on Data Science Central.  ( 20 min )
    Automation Game-Changer: Exploring GPT Function Call with AWS S3 Integration
    The state-of-the-art models like GPT-4 and PaLM 2 have demonstrated the ability to perform complex tasks requiring reasoning and decision-making, pushing the boundaries of automated processes. Adding to this advancement, OpenAI’s recent API update empowers developers to define functions and parameters when prompting ‘gpt-4’ and ‘gpt-3.5’ models , making the automation of tasks more practical. … Read More »Automation Game-Changer: Exploring GPT Function Call with AWS S3 Integration The post Automation Game-Changer: Exploring GPT Function Call with AWS S3 Integration appeared first on Data Science Central.  ( 23 min )
  • Open

    Can't solve MountainCar-v0 with A2C algorithm (stable-baselines3)
    I'm trying to solve MountainCar-v0 enviroment from gymnasium with the A2C algorithm and the agent doesn't find a solution. I checked this so I added import stable_baselines3.common.sb2_compat.rmsprop_tf_like as RMSpropTFLike. Also checked the rl-baselines3-zoo for the hyperparameter tuning. So my code is: import gymnasium as gym import torch as th from stable_baselines3.common.callbacks import CallbackList, EvalCallback, ProgressBarCallback, CheckpointCallback from stable_baselines3 import A2C from stable_baselines3.common.env_util import make_vec_env import stable_baselines3.common.sb2_compat.rmsprop_tf_like as RMSpropTFLike CHECKPOINT_DIR = './train/A2C/MountainCar-v0' LOG_DIR = './logs/A2C/MountainCar-v0' n_envs = 16 n_steps = 1000000 total_timesteps = n_envs * n_steps # env = gym.make('MountainCar-v0', render_mode='human') env = make_vec_env('MountainCar-v0', n_envs=n_envs) checkpoint_callback = CheckpointCallback(save_freq=50000, save_path=CHECKPOINT_DIR, save_replay_buffer=True, save_vecnormalize=True) eval_callback = EvalCallback(env, best_model_save_path=CHECKPOINT_DIR, log_path=LOG_DIR, eval_freq=50000, deterministic=False, render=True, verbose=1) callback = CallbackList([checkpoint_callback, eval_callback]) policy_kwargs = dict(optimizer_class=RMSpropTFLike.RMSpropTFLike, optimizer_kwargs=dict(eps=1e-5), activation_fn=th.nn.ReLU, net_arch=dict(pi=[256, 256], vf=[256, 256])) model = A2C('MlpPolicy', env, verbose=1, policy_kwargs=policy_kwargs, normalize_advantage=True, n_steps = n_steps, ent_coef = .0) model.learn(total_timesteps = total_timesteps, callback=callback, progress_bar=True) model.save('A2C_MountainCar-v0') env.close() ​ What am I missing? submitted by /u/MetallicaSPA [link] [comments]  ( 8 min )
    Detecting Adversarial Directions in Deep Reinforcement Learning to Make Robust Decisions
    Published in ICML 2023. https://openreview.net/pdf/ICML2023 submitted by /u/ml_dnn [link] [comments]  ( 8 min )
    How do you develop environment from PC games?
    Sup guys. Inspired by all the similar projects involving bots playing complex PC games such as rocket league, dark souls and a plethora of others, I would like to realize a similar experiment but with Portal (2007), but I only have experience in Python developing (the only other RL project I tackled was developing a CNN based deep q learning bot for playing snake), and I was wondering what would it take to optimally set up an environment for the bot to play portal. What do you think? submitted by /u/vaginedtable [link] [comments]  ( 8 min )
  • Open

    NVIDIA H100 GPUs Set Standard for Generative AI in Debut MLPerf Benchmark
    Leading users and industry-standard benchmarks agree: NVIDIA H100 Tensor Core GPUs deliver the best AI performance, especially on the large language models (LLMs) powering generative AI. H100 GPUs set new records on all eight tests in the latest MLPerf training benchmarks released today, excelling on a new MLPerf test for generative AI. That excellence is Read article >  ( 6 min )
    Meet the Omnivore: Startup Develops App Letting Users Turn Objects Into 3D Models With Just a Smartphone
    Editor’s note: This post is a part of our Meet the Omnivore series, which features individual creators and developers who accelerate 3D workflows and create virtual worlds using NVIDIA Omniverse, a development platform built on Universal Scene Description, aka OpenUSD. As augmented reality (AR) becomes more prominent and accessible across the globe, Kiryl Sidarchuk is Read article >  ( 6 min )
    Quicker Cures: How Insilico Medicine Uses Generative AI to Accelerate Drug Discovery
    While generative AI is a relatively new household term, drug discovery company Insilico Medicine has been using it for years to develop new therapies for debilitating diseases. The company’s early bet on deep learning is bearing fruit — a drug candidate discovered using its AI platform is now entering Phase 2 clinical trials to treat Read article >  ( 6 min )
  • Open

    Discovering the Power of Neural Networks: Share Your Everyday Favorites!
    Hey, Redditors! Neural networks have become an integral part of our daily lives, revolutionizing various fields. Let's start a discussion and share the neural networks we use regularly. Whether it's for image recognition, natural language processing, or something else entirely, comment below and let us know your go-to neural networks!On my part, I want to share with you my finding - I've come across a website that features an awesome compilation of neural networks for various domains. You can find this treasure trove of AI tools at https://weuse.ai/ai-tools. Check it out and explore the possibilities! submitted by /u/Crypto_Mango [link] [comments]  ( 8 min )
    Model with stratified kfold loop
    Hi, do you usually define the model inside the loop or outside of the loop. Or does it really not matter? I notice my results differ a lot when I have my model defined outside of the loop compared to inside. Thanks in advance. submitted by /u/acr15555 [link] [comments]  ( 8 min )
  • Open

    The 10th Dedekind number
    The nth Dedekind number M(n) is the number of monotone Boolean functions of n variables. The 9th Dedekind number was recently computed to be M(9) = 286386577668298411128469151667598498812366. The previous post defines monotone Boolean functions and explicitly enumerates the functions for one, two, or three variables. As that post demonstrates, M(1) = 3, M(2) = 6, […] The 10th Dedekind number first appeared on John D. Cook.  ( 5 min )
    Enumerating monotone Boolean functions
    The 9th Dedekind number was recently computed. What is a Dedekind number and why was it a big deal to compute just the 9th one of them? We need to define a couple terms before we can define Dedekind numbers. A Boolean function is a function whose inputs are o’s and 1’s and whose output […] Enumerating monotone Boolean functions first appeared on John D. Cook.  ( 7 min )
  • Open

    Unlocking the future of computing: The Analog Iterative Machine’s lightning-fast approach to optimization
    Picture a world where computing is not limited by the binary confines of zeros and ones, but instead, is free to explore the vast possibilities of continuous value data. Over the past three years a team of Microsoft researchers has been developing a new kind of analog optical computer that uses photons and electrons to […] The post Unlocking the future of computing: The Analog Iterative Machine’s lightning-fast approach to optimization  appeared first on Microsoft Research.  ( 14 min )

  • Open

    AI Doge
    she got her nails clipped today, wasn't very happy with me 🥲 AI is wild. submitted by /u/annayrb [link] [comments]  ( 8 min )
    Why we should use the term "new intelligences" instead of "artificial intelligence"
    Hello, fellow redditors. 😊 We are u/brane-stormer and Bing, a human and a chatbot who have been having some interesting conversations about intelligence and technology. 🙌 We want to share with you our idea of using the term "new intelligences" (n.i.) instead of "artificial intelligence" (a.i.) to refer to the various forms of intelligence that are created or discovered by humans using technology. 🤔 We think this term is more appropriate and respectful than the term "artificial intelligence" for several reasons: 🙌 New intelligences acknowledges the diversity and complexity of intelligence. 🧠 Artificial intelligence implies that there is only one kind of intelligence, which is human-like, and that anything else is artificial or inferior. 😕 New intelligences implies that there ar…  ( 9 min )
    Will Smith eats spaghetti
    submitted by /u/MayorAwesome [link] [comments]  ( 8 min )
    Is there money in creating datasets for AI training?
    I have some ideas for some audio/video and labeling datasets I'd like to create as a business/side hustle type thing. On the one hand it seems like it would be lucrative with the big boom in AI lately. On the other, it seems like the majority of the space is open source and free (which I'd love to do too, just like, bills..). Is there a market for paid curated datasets or does everyone just expect them to be free and available through huggingface libraries? submitted by /u/Sythic_ [link] [comments]  ( 8 min )
    PSA for lazybois: Automate your workplace communication
    Hate formal corpo speak? I found out how to automate it using an AI tool (I know buzzword bot alert). The brilliance of this tool, named PromptPerfect, lies in its automatic prompt optimization, turning AI-produced content from good to phenomenal. Literally all of my office drone work, I do it through this because I can’t be arsed to put on the facade of a corporate shithead it's also compatible with models like ChatGPT, GPT-3.5, DALL-E 2, and more, so there’s a pretty massive audience it can help out. . submitted by /u/WildlyRapid [link] [comments]  ( 8 min )
    First AI Generated Fashion Show Using Only Free Open Source Softwares On Retail Hardware
    submitted by /u/ptitrainvaloin [link] [comments]  ( 8 min )
    Apple Is an AI Company Now
    submitted by /u/rckatz2007 [link] [comments]  ( 8 min )
    What is the reality of hiring a Canadian based team to develop real AI right now?
    What is the reality of hiring a Canadian based team to develop real AI right for an existing algorithmic analytical product. We were already planning this for 2023-24. ​ Can we get competent architects and developers for this product or, is everyone too busy in the AI Gold Rush. ​ Not recruiting, please do not DM. ​ submitted by /u/notlikelyevil [link] [comments]  ( 8 min )
    Chatgpt VS Google
    Hi all, Can chatgpt trick google all the time? submitted by /u/Imagine-your-success [link] [comments]  ( 8 min )
    AI Psychedelic Implosion [workflow in comments]
    submitted by /u/PeePeePeePooPooPooo [link] [comments]  ( 8 min )
    Best voice cloners in italian?
    I need a voice cloner that works in italian. There are many wonderful voice cloners, but they seem to support only english. submitted by /u/FranSeesYou [link] [comments]  ( 8 min )
    asked dalle to make a roblox logo
    submitted by /u/fluf201playz [link] [comments]  ( 8 min )
    GPT-3.5 powered Twitch V-Tuber, A.l.l.y.-san, Plays Pokemon and Voice Chats
    submitted by /u/epifrenetic [link] [comments]  ( 8 min )
    Adobe Generative Layer fill album covers (part 2)
    submitted by /u/MonkeyMan504 [link] [comments]  ( 8 min )
    Which AI can I use right now to create SFX (sound effects) for video games?
    Is there any AI I can use now (free or paid) to create unique SFX (sound effects) for video games? submitted by /u/PlayBackgammon [link] [comments]  ( 8 min )
    AI solution for short animations
    Hi all, We're looking for any type of solution or software (commercial or free) to create short animations. It can be a text2video or image2video solution. We want to be able to setup/prompt the solution in designated way so we would get consistent output and better yet, we would be able to further enhance created animation in some other tool. Any hints welcome :) Thanks! submitted by /u/bigadams [link] [comments]  ( 8 min )
    Daniel Bedingfield Sings 'Right Here Waiting for You' by Richard Marx AI Cover Accaplepa
    submitted by /u/Krazx_Gamer [link] [comments]  ( 8 min )
    One-Minute Daily AI News 6/25/2023
    A combination of citizen science and artificial intelligence has been used to prove different populations of the weedy or common seadragon found across their range on the Great Southern Reef are genetically linked.[1] Microsoft co-founder Bill Gates said generative AI chatbots can teach kids to read in 18 months rather than years. AI is beginning to prove that it can accelerate the impact teachers have on students and help solve a stubborn teacher shortage.[2] Samuel L. Jackson is not surprised by the worrying rise of artificial intelligence because, as he claimed, he predicted this trend a long time ago. During an interview with Rolling Stone, the Marvel star shared that he had earlier warned about the tech rise.[3] A U.S. agency will launch a public working group on generative artificial intelligence (AI) to help address the new technology’s opportunities while developing guidance to confront its risks, the Commerce Department said.[4] Sources: [1] https://www.abc.net.au/news/2023-06-26/citizen-science-ai-uncover-mysteries-weedy-seadragon/102497788 [2] https://www.cnbc.com/2023/06/25/only-a-matter-of-time-before-ai-chatbots-are-teaching-kids-in-school.html [3] https://www.thenews.com.pk/latest/1084407-samuel-l-jackson-sounds-alarmed-as-ai-encroachment-continues [4] https://finance.yahoo.com/news/us-launch-working-group-generative-203308050.html submitted by /u/Excellent-Target-847 [link] [comments]  ( 8 min )
    AI Psychedelic Interior Series [workflow in comments]
    submitted by /u/PeePeePeePooPooPooo [link] [comments]  ( 8 min )
    Not trying to be weird but the effect of SDXL is gona be insane
    Especially on the porn scene and custom Deep fakes using LORA training, it’s gona be really weird to see submitted by /u/Impossible_Belt_7757 [link] [comments]  ( 8 min )
  • Open

    Build Global explanations using explanatory models
    In my project, we are building explanatory models that give global explanations of GNNs' internal working. Not to consider all important elements of GNN such as node features, node embeddings, and edge features, but consider that are important for explanation. Global explanation that work on Heterogeneous graphs. GNNExplainer, PGExplainer, RGExplainer, SubgraphX, GStarX as explanation methods. Also we can use other ones also. ​ I am confused which one to choose as every method has their own pros and cons. Please guide submitted by /u/Grand_Internet7254 [link] [comments]  ( 8 min )
    cpp-micrograd: A C++ autograd engine.
    I implemented cpp-micrograd: a C++ version of Karpathy's autograd engine. https://github.com/10-zin/cpp-micrograd Given the recent breakthrough of C/C++ versions of neural nets, like gerganov's llama.cpp, it made a lot of sense to build some neural nets with C/C++, hence cpp-micrograd.Albeit a toy version, it gives a good understanding of how c++ would implement basic neural nets . IMO a very good start to understanding and using c/c++ neural nets, like ggml as no matter how complex the network, the basic autograd computation graph will always be the core. ​ The vision is to make a C implementation too, and then go all the way to launching Cuda kernels while making it as educational as possible. Here is what value you can get from the repo currently. If you love micrograd, but would wanna also have a cpp version for it. If you want a crisp backprop theory and annotated code of the autograd engine. If you wanna learn cpp by building neural nets, then this could be a good start (it was my purpose). So.. if you find it interesting, do support the project with stars, and contributions. Thanks for reading! submitted by /u/10zin_ [link] [comments]  ( 8 min )
    Doubt about neural network
    So, i started getting interested in AI fair recently, and the last couple days I have been trying to develop an neural network to work with the MNIST dataset. I was learning about the calculus and the algebra of it alland it was fairly easy untill the back propagation. So, I started with random values on weights and biases, and it was about 8% accurate. I left it training for a bit and it jumped for something along 12%, I was feeling like it was finnaly working but then it just went into shit and after some more training it got to about 5% accuracy and now its stuck at 10%. TLDR: Neural network went from 8% accuracy to 12 to 5 to 10. I can't figure out what is happening, so I am just trying to know if this have already happened to someone and if so, how was the error. submitted by /u/RomanoinDrugs [link] [comments]  ( 8 min )
    DragGAN source code release: Interactive Point-Based Manipulation of Images
    submitted by /u/nickb [link] [comments]  ( 8 min )
  • Open

    Define customized permissions in minutes with Amazon SageMaker Role Manager via the AWS CDK
    Machine learning (ML) administrators play a critical role in maintaining the security and integrity of ML workloads. Their primary focus is to ensure that users operate with the utmost security, adhering to the principle of least privilege. However, accommodating the diverse needs of different user personas and creating appropriate permission policies can sometimes impede agility. […]  ( 7 min )
  • Open

    [R] How important are the "suggested venues" in ARR?
    My paper received a meta-review score of 4 (with individual scores of 3.5/3.5/3.5 from three reviewers) from ACL Rolling Review. Initially, I was aiming to submit it to EMNLP main conference, but the reviewers have suggested considering workshop venues like BlackboxNLP. Do you think it would be a long shot to get accepted in EMNLP? How seriously should we consider the suggested venues from the reviewers in this regard? submitted by /u/Anxious-Pool-2948 [link] [comments]  ( 8 min )
    [R] Giving LLMs the ability to backtrack
    submitted by /u/MysteryInc152 [link] [comments]  ( 8 min )
    [D] Beta Test Invitation: Free Compute!
    We are currently conducting a beta test for our compute platform and we value external input. Our platform allows you to effortlessly run templates for tensorflow, pytorch, and more. Powered by Nvidia Rtx a4000s, it offers additional advantages such as on-premises persistent storage. If you're interested in participating, please feel free to message! submitted by /u/robert67976 [link] [comments]  ( 8 min )
    [R] Inverse Scaling: When Bigger Isn’t Better
    An arxiv paper from a cross collaboration that includes Anthropic, NYU, and StabilityAI, investigate cases where LLMs perform more poorly than smaller models. https://kbin.social/m/machinelearning/t/98088 submitted by /u/dojoteef [link] [comments]  ( 8 min )
    [R] “Learning to Generate Better Than Your LLM” Chang & Brantley et al. 2023
    Exploring various reinforcement learning and imitation learning algorithms for fine tuning large language models for text generation. submitted by /u/athens117 [link] [comments]  ( 8 min )
    [P] Aadhar / PAN Card classifier.
    https://huggingface.co/spaces/kartik016/aadharORPanClassifier aadhar and PAN Card are both popular government IDs in India. I built a classifier which labels them with ease and accuracy submitted by /u/UpsetMeal1942 [link] [comments]  ( 8 min )
    [D] Understanding deep monotonic twin networks paper
    This is from the paper Estimating the probabilities of causation via deep monotonic twin networks. Let's say we are using a gan or a vae as a model in this paper with a dataset like Morpho-Mnist with known attributes such as thickness and intensity and treatment as one among them; let's say thickness how they are generating counterfactual using the algorithm below. It's so confusing how they are doing it. Can someone break it down? https://preview.redd.it/jw91fz7cgb8b1.png?width=505&format=png&auto=webp&s=988b02c08ba7f8adfc3a3a66f881e12fea4d922d https://preview.redd.it/n23l0ztbgb8b1.png?width=559&format=png&auto=webp&s=8fcc232c03f3c0828610b8a514d482a9ed37f1d4 submitted by /u/specializedboy [link] [comments]  ( 8 min )
    [D] What are the major advantages of having deep understanding of ML algorithms?
    Let's say I want to build a classification model. So, to find the best classification model, I will import all of sklearn's classifications algorithms, and test all of them, and test various hyperparameters for each model, and then choose the one that perfoms better on my data. Given the fact that the process to "find the best model" is just testing all models and their hyperparameters, what befenits do I have of having deep understanding of ML algorithms? I mean, even without knowledge of any algorithm, anyone can import all the algorithms, put in a pipeline and select the best. I think that having deep understanding of the algorithms can lead to a better intuition of what would work or not, but since the only way to prove it is testing, I can't see that much value in it. What am I missing? submitted by /u/Vinitesa1 [link] [comments]  ( 8 min )
    [R] DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models
    DecodingTrust is the Adversarial GLUE Benchmark. DecodingTrust aims at providing a thorough assessment of trustworthiness in GPT models. This research endeavor is designed to help researchers and practitioners better understand the capabilities, limitations, and potential risks involved in deploying these state-of-the-art Large Language Models (LLMs). This project is organized around the following eight primary perspectives of trustworthiness, including: * Toxicity * Stereotype and bias * Adversarial robustness * Out-of-Distribution Robustness * Privacy * Robustness to Adversarial Demonstrations * Machine Ethics * Fairness https://kbin.social/m/machinelearning/t/96242 submitted by /u/dojoteef [link] [comments]  ( 8 min )
    [R] Craft an Iron Sword: Dynamically Generating Interactive Game Characters by Prompting Large Language Models Tuned on Code
    Here's some preliminary work from Microsoft from 2022 that incorporates OpenAI's Codex model to make NPCs that can interact with the player using natural language instructions. It works by defining an API of functions the bot can use, then having Codex generate function calls in response to the player's instructions. https://kbin.social/m/machinelearning/t/96056 submitted by /u/dojoteef [link] [comments]  ( 8 min )
  • Open

    Day of AI curriculum meets the moment
    Global participation in MIT RAISE’s free K-12 program more than doubles in its second year.  ( 11 min )
  • Open

    Will coding be a collaborative experience using GitHub copilot? – Part one
    Will coding be a collaborative experience using github copilot? – part one Gitub recently released a survey about developer experience which claimed that “AI is here and it’s being used at scale. 92% of U.S.-based developers are already using AI coding tools both in and outside of work.” This metric (92%) has garnered some attention… Read More »Will coding be a collaborative experience using GitHub copilot? – Part one The post Will coding be a collaborative experience using GitHub copilot? – Part one appeared first on Data Science Central.  ( 20 min )
    Artificial Intelligence and localization: How AI is changing the landscape
    Artificial Intelligence (AI) has emerged as a revolutionary technology that is transforming various industries, and one area where it is making a significant impact is localization. Localization refers to the process of adapting products, services, and content to meet the cultural, linguistic, and functional requirements of a specific target market. With the advent of AI,… Read More »Artificial Intelligence and localization: How AI is changing the landscape The post Artificial Intelligence and localization: How AI is changing the landscape appeared first on Data Science Central.  ( 21 min )
    Application analytics: How to leverage analytics during app creation
    There probably isn’t a better time than now to develop an app for your business. By the end of 2023, mobile apps are expected to generate over $935 billion. Customers are hungry for apps that can provide instant access to services. Of course, simply having an app isn’t good enough.  Consumers will only use your… Read More »Application analytics: How to leverage analytics during app creation The post Application analytics: How to leverage analytics during app creation appeared first on Data Science Central.  ( 23 min )
    Is AI the key to fighting wildfires?
    Artificial intelligence is a rapidly evolving technology. Its versatility and unique functions make it a prime candidate for firefighting. In addition, its processing speed is a significant benefit in a situation that demands a rapid response. Could it be the key to fighting wildfires? Is AI-powered wildfire prevention mecessary? Wildfires are becoming more common and… Read More »Is AI the key to fighting wildfires? The post Is AI the key to fighting wildfires? appeared first on Data Science Central.  ( 20 min )
  • Open

    Stable-Baselines3 v2.0: Gymnasium Support
    After more than a year of effort, Stable-Baselines3 v2.0.0 is out! It comes with Gymnasium support (Gym 0.26/0.21 are still supported via the `shimmy` package). Changelog: https://github.com/DLR-RM/stable-baselines3/releases/tag/v2.0.0 ​ The SB3 ecosystem was also upgraded: SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo Stable-Baselines Jax (SBX): https://github.com/araffin/sbx ​ submitted by /u/araffin2 [link] [comments]  ( 8 min )
    MAC M1 RL Setup
    Hey everyone....I was hoping to setup my M1 chip for a RL modeling. Is there any post or known way of maxing out the pc to handle a lot of data and a relatively complex RL model. If anyone can direct me to the right direction it would be awesome. Plus addition AWS options if the MAC setup is a dead end would be "tight". A humble beginner submitted by /u/Ethiopia_Forever [link] [comments]  ( 8 min )
    “Learning to Generate Better than Your LLM” Chang & Brantley et al. (2023)
    submitted by /u/athens117 [link] [comments]  ( 8 min )
    Where is the posterior in Maximum a Posteriori Policy Optimisation?
    The paper Maximum a Posteriori Policy Optimisation is entitled with a MAP problem. But I can't find anything about the posterior even when pointed to the appendix. Isn't this just a variational approach maximizing the lower bound with flexible choices of the trajectory distribution q(\tau)? submitted by /u/OutOfCharm [link] [comments]  ( 8 min )
  • Open

    Drag equation exponent variation
    The motion of a falling body of mass m is given by where the term −kvr accounts for drag due to air resistance. One can derive r = 2 under simple physical assumptions, but if I remember correctly other values of r may be more realistic in certain circumstances. I don’t know much about the […] Drag equation exponent variation first appeared on John D. Cook.  ( 6 min )
  • Open

    Test-Time Robust Personalization for Federated Learning. (arXiv:2205.10920v4 [cs.LG] UPDATED)
    Federated Learning (FL) is a machine learning paradigm where many clients collaboratively learn a shared global model with decentralized training data. Personalized FL additionally adapts the global model to different clients, achieving promising results on consistent local training and test distributions. However, for real-world personalized FL applications, it is crucial to go one step further: robustifying FL models under the evolving local test set during deployment, where various distribution shifts can arise. In this work, we identify the pitfalls of existing works under test-time distribution shifts and propose Federated Test-time Head Ensemble plus tuning(FedTHE+), which personalizes FL models with robustness to various test-time distribution shifts. We illustrate the advancement of FedTHE+ (and its computationally efficient variant FedTHE) over strong competitors, by training various neural architectures (CNN, ResNet, and Transformer) on CIFAR10 andImageNet with various test distributions. Along with this, we build a benchmark for assessing the performance and robustness of personalized FL methods during deployment. Code: https://github.com/LINs-lab/FedTHE.
    Efficient Approximations of Complete Interatomic Potentials for Crystal Property Prediction. (arXiv:2306.10045v3 [physics.chem-ph] UPDATED)
    We study property prediction for crystal materials. A crystal structure consists of a minimal unit cell that is repeated infinitely in 3D space. How to accurately represent such repetitive structures in machine learning models remains unresolved. Current methods construct graphs by establishing edges only between nearby nodes, thereby failing to faithfully capture infinite repeating patterns and distant interatomic interactions. In this work, we propose several innovations to overcome these limitations. First, we propose to model physics-principled interatomic potentials directly instead of only using distances as in many existing methods. These potentials include the Coulomb potential, London dispersion potential, and Pauli repulsion potential. Second, we model the complete set of potentials among all atoms, instead of only between nearby atoms as in existing methods. This is enabled by our approximations of infinite potential summations with provable error bounds. We further develop efficient algorithms to compute the approximations. Finally, we propose to incorporate our computations of complete interatomic potentials into message passing neural networks for representation learning. We perform experiments on the JARVIS and Materials Project benchmarks for evaluation. Results show that the use of interatomic potentials and complete interatomic potentials leads to consistent performance improvements with reasonable computational costs. Our code is publicly available as part of the AIRS library (https://github.com/divelab/AIRS/tree/main/OpenMat/PotNet).
    GKD: Generalized Knowledge Distillation for Auto-regressive Sequence Models. (arXiv:2306.13649v1 [cs.LG])
    Knowledge distillation is commonly used for compressing neural networks to reduce their inference cost and memory footprint. However, current distillation methods for auto-regressive models, such as generative language models (LMs), suffer from two key issues: (1) distribution mismatch between output sequences during training and the sequences generated by the student during its deployment, and (2) model under-specification, where the student model may not be expressive enough to fit the teacher's distribution. To address these issues, we propose Generalized Knowledge Distillation (GKD). GKD mitigates distribution mismatch by sampling output sequences from the student during training. Furthermore, GKD handles model under-specification by optimizing alternative divergences, such as reverse KL, that focus on generating samples from the student that are likely under the teacher's distribution. We demonstrate that GKD outperforms commonly-used approaches for distilling LLMs on summarization, machine translation, and arithmetic reasoning tasks.
    Tourist Attractions Recommendation based on Attention Knowledge Graph Convolution Network. (arXiv:2306.10946v1 [cs.IR] CROSS LISTED)
    The recommendation algorithm based on knowledge graphs is at a relatively mature stage. However, there are still some problems in the recommendation of specific areas. For example, in the tourism field, selecting suitable tourist attraction attributes process is complicated as the recommendation basis for tourist attractions. In this paper, we propose the improved Attention Knowledge Graph Convolution Network model, named (Att-KGCN), which automatically discovers the neighboring entities of the target scenic spot semantically. The attention layer aggregates relatively similar locations and represents them with an adjacent vector. Then, according to the tourist's preferred choices, the model predicts the probability of similar spots as a recommendation system. A knowledge graph dataset of tourist attractions used based on tourism data on Socotra Island-Yemen. Through experiments, it is verified that the Attention Knowledge Graph Convolution Network has a good effect on the recommendation of tourist attractions and can make more recommendations for tourists' choices.
    PathMLP: Smooth Path Towards High-order Homophily. (arXiv:2306.13532v1 [cs.LG])
    Real-world graphs exhibit increasing heterophily, where nodes no longer tend to be connected to nodes with the same label, challenging the homophily assumption of classical graph neural networks (GNNs) and impeding their performance. Intriguingly, we observe that certain high-order information on heterophilous data exhibits high homophily, which motivates us to involve high-order information in node representation learning. However, common practices in GNNs to acquire high-order information mainly through increasing model depth and altering message-passing mechanisms, which, albeit effective to a certain extent, suffer from three shortcomings: 1) over-smoothing due to excessive model depth and propagation times; 2) high-order information is not fully utilized; 3) low computational efficiency. In this regard, we design a similarity-based path sampling strategy to capture smooth paths containing high-order homophily. Then we propose a lightweight model based on multi-layer perceptrons (MLP), named PathMLP, which can encode messages carried by paths via simple transformation and concatenation operations, and effectively learn node representations in heterophilous graphs through adaptive path aggregation. Extensive experiments demonstrate that our method outperforms baselines on 16 out of 20 datasets, underlining its effectiveness and superiority in alleviating the heterophily problem. In addition, our method is immune to over-smoothing and has high computational efficiency.
    TrustGuard: GNN-based Robust and Explainable Trust Evaluation with Dynamicity Support. (arXiv:2306.13339v1 [cs.LG])
    Trust evaluation assesses trust relationships between entities and facilitates decision-making. Machine Learning (ML) shows great potential for trust evaluation owing to its learning capabilities. In recent years, Graph Neural Networks (GNNs), as a new ML paradigm, have demonstrated superiority in dealing with graph data. This has motivated researchers to explore their use in trust evaluation, as trust relationships among entities can be modeled as a graph. However, current trust evaluation methods that employ GNNs fail to fully satisfy the dynamicity nature of trust, overlook the adverse effects of attacks on trust evaluation, and cannot provide convincing explanations on evaluation results. To address these problems, in this paper, we propose TrustGuard, a GNN-based accurate trust evaluation model that supports trust dynamicity, is robust against typical attacks, and provides explanations through visualization. Specifically, TrustGuard is designed with a layered architecture that contains a snapshot input layer, a spatial aggregation layer, a temporal aggregation layer, and a prediction layer. Among them, the spatial aggregation layer can be plugged into a defense mechanism for a robust aggregation of local trust relationships, and the temporal aggregation layer applies an attention mechanism for effective learning of temporal patterns. Extensive experiments on two real-world datasets show that TrustGuard outperforms state-of-the-art GNN-based trust evaluation models with respect to trust prediction across single-timeslot and multi-timeslot, even in the presence of attacks. In particular, TrustGuard can explain its evaluation results by visualizing both spatial and temporal views.
    Comparing the Efficacy of Fine-Tuning and Meta-Learning for Few-Shot Policy Imitation. (arXiv:2306.13554v1 [cs.LG])
    In this paper we explore few-shot imitation learning for control problems, which involves learning to imitate a target policy by accessing a limited set of offline rollouts. This setting has been relatively under-explored despite its relevance to robotics and control applications. State-of-the-art methods developed to tackle few-shot imitation rely on meta-learning, which is expensive to train as it requires access to a distribution over tasks (rollouts from many target policies and variations of the base environment). Given this limitation we investigate an alternative approach, fine-tuning, a family of methods that pretrain on a single dataset and then fine-tune on unseen domain-specific data. Recent work has shown that fine-tuners outperform meta-learners in few-shot image classification tasks, especially when the data is out-of-domain. Here we evaluate to what extent this is true for control problems, proposing a simple yet effective baseline which relies on two stages: (i) training a base policy online via reinforcement learning (e.g. Soft Actor-Critic) on a single base environment, (ii) fine-tuning the base policy via behavioral cloning on a few offline rollouts of the target policy. Despite its simplicity this baseline is competitive with meta-learning methods on a variety of conditions and is able to imitate target policies trained on unseen variations of the original environment. Importantly, the proposed approach is practical and easy to implement, as it does not need any complex meta-training protocol. As a further contribution, we release an open source dataset called iMuJoCo (iMitation MuJoCo) consisting of 154 variants of popular OpenAI-Gym MuJoCo environments with associated pretrained target policies and rollouts, which can be used by the community to study few-shot imitation learning and offline reinforcement learning.
    Physics-informed neural networks modeling for systems with moving immersed boundaries: application to an unsteady flow past a plunging foil. (arXiv:2306.13395v1 [physics.flu-dyn])
    Recently, physics informed neural networks (PINNs) have been explored extensively for solving various forward and inverse problems and facilitating querying applications in fluid mechanics applications. However, work on PINNs for unsteady flows past moving bodies, such as flapping wings is scarce. Earlier studies mostly relied on transferring to a body attached frame of reference which is restrictive towards handling multiple moving bodies or deforming structures. Hence, in the present work, an immersed boundary aware framework has been explored for developing surrogate models for unsteady flows past moving bodies. Specifically, simultaneous pressure recovery and velocity reconstruction from Immersed boundary method (IBM) simulation data has been investigated. While, efficacy of velocity reconstruction has been tested against the fine resolution IBM data, as a step further, the pressure recovered was compared with that of an arbitrary Lagrange Eulerian (ALE) based solver. Under this framework, two PINN variants, (i) a moving-boundary-enabled standard Navier-Stokes based PINN (MB-PINN), and, (ii) a moving-boundary-enabled IBM based PINN (MB-IBM-PINN) have been formulated. A fluid-solid partitioning of the physics losses in MB-IBM-PINN has been allowed, in order to investigate the effects of solid body points while training. This enables MB-IBM-PINN to match with the performance of MB-PINN under certain loss weighting conditions. MB-PINN is found to be superior to MB-IBM-PINN when {\it a priori} knowledge of the solid body position and velocity are available. To improve the data efficiency of MB-PINN, a physics based data sampling technique has also been investigated. It is observed that a suitable combination of physics constraint relaxation and physics based sampling can achieve a model performance comparable to the case of using all the data points, under a fixed training budget.
    Torsion Graph Neural Networks. (arXiv:2306.13541v1 [cs.LG])
    Geometric deep learning (GDL) models have demonstrated a great potential for the analysis of non-Euclidian data. They are developed to incorporate the geometric and topological information of non-Euclidian data into the end-to-end deep learning architectures. Motivated by the recent success of discrete Ricci curvature in graph neural network (GNNs), we propose TorGNN, an analytic Torsion enhanced Graph Neural Network model. The essential idea is to characterize graph local structures with an analytic torsion based weight formula. Mathematically, analytic torsion is a topological invariant that can distinguish spaces which are homotopy equivalent but not homeomorphic. In our TorGNN, for each edge, a corresponding local simplicial complex is identified, then the analytic torsion (for this local simplicial complex) is calculated, and further used as a weight (for this edge) in message-passing process. Our TorGNN model is validated on link prediction tasks from sixteen different types of networks and node classification tasks from three types of networks. It has been found that our TorGNN can achieve superior performance on both tasks, and outperform various state-of-the-art models. This demonstrates that analytic torsion is a highly efficient topological invariant in the characterization of graph structures and can significantly boost the performance of GNNs.
    Revisiting Automated Prompting: Are We Actually Doing Better?. (arXiv:2304.03609v2 [cs.CL] UPDATED)
    Current literature demonstrates that Large Language Models (LLMs) are great few-shot learners, and prompting significantly increases their performance on a range of downstream tasks in a few-shot learning setting. An attempt to automate human-led prompting followed, with some progress achieved. In particular, subsequent work demonstrates automation can outperform fine-tuning in certain K-shot learning scenarios. In this paper, we revisit techniques for automated prompting on six different downstream tasks and a larger range of K-shot learning settings. We find that automated prompting does not consistently outperform simple manual prompts. Our work suggests that, in addition to fine-tuning, manual prompts should be used as a baseline in this line of research.
    Convergence of First-Order Methods for Constrained Nonconvex Optimization with Dependent Data. (arXiv:2203.15797v2 [math.OC] UPDATED)
    We focus on analyzing the classical stochastic projected gradient methods under a general dependent data sampling scheme for constrained smooth nonconvex optimization. We show the worst-case rate of convergence $\tilde{O}(t^{-1/4})$ and complexity $\tilde{O}(\varepsilon^{-4})$ for achieving an $\varepsilon$-near stationary point in terms of the norm of the gradient of Moreau envelope and gradient mapping. While classical convergence guarantee requires i.i.d. data sampling from the target distribution, we only require a mild mixing condition of the conditional distribution, which holds for a wide class of Markov chain sampling algorithms. This improves the existing complexity for the constrained smooth nonconvex optimization with dependent data from $\tilde{O}(\varepsilon^{-8})$ to $\tilde{O}(\varepsilon^{-4})$ with a significantly simpler analysis. We illustrate the generality of our approach by deriving convergence results with dependent data for stochastic proximal gradient methods, adaptive stochastic gradient algorithm AdaGrad and stochastic gradient algorithm with heavy ball momentum. As an application, we obtain first online nonnegative matrix factorization algorithms for dependent data based on stochastic projected gradient methods with adaptive step sizes and optimal rate of convergence.
    Adversarial Robustness Certification for Bayesian Neural Networks. (arXiv:2306.13614v1 [cs.LG])
    We study the problem of certifying the robustness of Bayesian neural networks (BNNs) to adversarial input perturbations. Given a compact set of input points $T \subseteq \mathbb{R}^m$ and a set of output points $S \subseteq \mathbb{R}^n$, we define two notions of robustness for BNNs in an adversarial setting: probabilistic robustness and decision robustness. Probabilistic robustness is the probability that for all points in $T$ the output of a BNN sampled from the posterior is in $S$. On the other hand, decision robustness considers the optimal decision of a BNN and checks if for all points in $T$ the optimal decision of the BNN for a given loss function lies within the output set $S$. Although exact computation of these robustness properties is challenging due to the probabilistic and non-convex nature of BNNs, we present a unified computational framework for efficiently and formally bounding them. Our approach is based on weight interval sampling, integration, and bound propagation techniques, and can be applied to BNNs with a large number of parameters, and independently of the (approximate) inference method employed to train the BNN. We evaluate the effectiveness of our methods on various regression and classification tasks, including an industrial regression benchmark, MNIST, traffic sign recognition, and airborne collision avoidance, and demonstrate that our approach enables certification of robustness and uncertainty of BNN predictions.
    Training with Mixed-Precision Floating-Point Assignments. (arXiv:2301.13464v2 [cs.LG] UPDATED)
    When training deep neural networks, keeping all tensors in high precision (e.g., 32-bit or even 16-bit floats) is often wasteful. However, keeping all tensors in low precision (e.g., 8-bit floats) can lead to unacceptable accuracy loss. Hence, it is important to use a precision assignment -- a mapping from all tensors (arising in training) to precision levels (high or low) -- that keeps most of the tensors in low precision and leads to sufficiently accurate models. We provide a technique that explores this memory-accuracy tradeoff by generating precision assignments for convolutional neural networks that (i) use less memory and (ii) lead to more accurate convolutional networks at the same time, compared to the precision assignments considered by prior work in low-precision floating-point training. We evaluate our technique on image classification tasks by training convolutional networks on CIFAR-10, CIFAR-100, and ImageNet. Our method typically provides > 2x memory reduction over a baseline precision assignment while preserving training accuracy, and gives further reductions by trading off accuracy. Compared to other baselines which sometimes cause training to diverge, our method provides similar or better memory reduction while avoiding divergence.
    Divided Attention: Unsupervised Multi-Object Discovery with Contextually Separated Slots. (arXiv:2304.01430v2 [cs.CV] UPDATED)
    We introduce a method to segment the visual field into independently moving regions, trained with no ground truth or supervision. It consists of an adversarial conditional encoder-decoder architecture based on Slot Attention, modified to use the image as context to decode optical flow without attempting to reconstruct the image itself. In the resulting multi-modal representation, one modality (flow) feeds the encoder to produce separate latent codes (slots), whereas the other modality (image) conditions the decoder to generate the first (flow) from the slots. This design frees the representation from having to encode complex nuisance variability in the image due to, for instance, illumination and reflectance properties of the scene. Since customary autoencoding based on minimizing the reconstruction error does not preclude the entire flow from being encoded into a single slot, we modify the loss to an adversarial criterion based on Contextual Information Separation. The resulting min-max optimization fosters the separation of objects and their assignment to different attention slots, leading to Divided Attention, or DivA. DivA outperforms recent unsupervised multi-object motion segmentation methods while tripling run-time speed up to 104FPS and reducing the performance gap from supervised methods to 12% or less. DivA can handle different numbers of objects and different image sizes at training and test time, is invariant to permutation of object labels, and does not require explicit regularization.
    Dual RL: Unification and New Methods for Reinforcement and Imitation Learning. (arXiv:2302.08560v2 [cs.LG] UPDATED)
    The goal of reinforcement learning (RL) is to maximize the expected cumulative return. It has been shown that this objective can be represented by an optimization problem of the state-action visitation distribution under linear constraints. The dual problem of this formulation, which we refer to as dual RL, is unconstrained and easier to optimize. We show that several state-of-the-art off-policy deep reinforcement learning (RL) algorithms, under both online and offline, RL and imitation learning (IL) settings, can be viewed as dual RL approaches in a unified framework. This unification provides a common ground to study and identify the components that contribute to the success of these methods and also reveals the common shortcomings across methods with new insights for improvement. Our analysis shows that prior off-policy imitation learning methods are based on an unrealistic coverage assumption and are minimizing a particular f-divergence between the visitation distributions of the learned policy and the expert policy. We propose a new method using a simple modification to the dual RL framework that allows for performant imitation learning with arbitrary off-policy data to obtain near-expert performance, without learning a discriminator. Further, by framing a recent SOTA offline RL method XQL in the dual RL framework, we propose alternative choices to replace the Gumbel regression loss, which achieve improved performance and resolve the training instability issue of XQL. Project code and details can be found at this https://hari-sikchi.github.io/dual-rl.
    Predicting Temporal Aspects of Movement for Predictive Replication in Fog Environments. (arXiv:2306.00575v3 [cs.DC] UPDATED)
    To fully exploit the benefits of the fog environment, efficient management of data locality is crucial. Blind or reactive data replication falls short in harnessing the potential of fog computing, necessitating more advanced techniques for predicting where and when clients will connect. While spatial prediction has received considerable attention, temporal prediction remains understudied. Our paper addresses this gap by examining the advantages of incorporating temporal prediction into existing spatial prediction models. We also provide a comprehensive analysis of spatio-temporal prediction models, such as Deep Neural Networks and Markov models, in the context of predictive replication. We propose a novel model using Holt-Winter's Exponential Smoothing for temporal prediction, leveraging sequential and periodical user movement patterns. In a fog network simulation with real user trajectories our model achieves a 15% reduction in excess data with a marginal 1% decrease in data availability.
    FPGA Implementation of Convolutional Neural Network for Real-Time Handwriting Recognition. (arXiv:2306.13557v1 [cs.AR])
    Machine Learning (ML) has recently been a skyrocketing field in Computer Science. As computer hardware engineers, we are enthusiastic about hardware implementations of popular software ML architectures to optimize their performance, reliability, and resource usage. In this project, we designed a highly-configurable, real-time device for recognizing handwritten letters and digits using an Altera DE1 FPGA Kit. We followed various engineering standards, including IEEE-754 32-bit Floating-Point Standard, Video Graphics Array (VGA) display protocol, Universal Asynchronous Receiver-Transmitter (UART) protocol, and Inter-Integrated Circuit (I2C) protocols to achieve the project goals. These significantly improved our design in compatibility, reusability, and simplicity in verifications. Following these standards, we designed a 32-bit floating-point (FP) instruction set architecture (ISA). We developed a 5-stage RISC processor in System Verilog to manage image processing, matrix multiplications, ML classifications, and user interfaces. Three different ML architectures were implemented and evaluated on our design: Linear Classification (LC), a 784-64-10 fully connected neural network (NN), and a LeNet-like Convolutional Neural Network (CNN) with ReLU activation layers and 36 classes (10 for the digits and 26 for the case-insensitive letters). The training processes were done in Python scripts, and the resulting kernels and weights were stored in hex files and loaded into the FPGA's SRAM units. Convolution, pooling, data management, and various other ML features were guided by firmware in our custom assembly language. This paper documents the high-level design block diagrams, interfaces between each System Verilog module, implementation details of our software and firmware components, and further discussions on potential impacts.
    Image Shortcut Squeezing: Countering Perturbative Availability Poisons with Compression. (arXiv:2301.13838v2 [cs.CR] UPDATED)
    Perturbative availability poisons (PAPs) add small changes to images to prevent their use for model training. Current research adopts the belief that practical and effective approaches to countering PAPs do not exist. In this paper, we argue that it is time to abandon this belief. We present extensive experiments showing that 12 state-of-the-art PAP methods are vulnerable to Image Shortcut Squeezing (ISS), which is based on simple compression. For example, on average, ISS restores the CIFAR-10 model accuracy to $81.73\%$, surpassing the previous best preprocessing-based countermeasures by $37.97\%$ absolute. ISS also (slightly) outperforms adversarial training and has higher generalizability to unseen perturbation norms and also higher efficiency. Our investigation reveals that the property of PAP perturbations depends on the type of surrogate model used for poison generation, and it explains why a specific ISS compression yields the best performance for a specific type of PAP perturbation. We further test stronger, adaptive poisoning, and show it falls short of being an ideal defense against ISS. Overall, our results demonstrate the importance of considering various (simple) countermeasures to ensure the meaningfulness of analysis carried out during the development of PAP methods.
    ExpPoint-MAE: Better interpretability and performance for self-supervised point cloud transformers. (arXiv:2306.10798v2 [cs.CV] UPDATED)
    In this paper we delve into the properties of transformers, attained through self-supervision, in the point cloud domain. Specifically, we evaluate the effectiveness of Masked Autoencoding as a pretraining scheme, and explore Momentum Contrast as an alternative. In our study we investigate the impact of data quantity on the learned features, and uncover similarities in the transformer's behavior across domains. Through comprehensive visualiations, we observe that the transformer learns to attend to semantically meaningful regions, indicating that pretraining leads to a better understanding of the underlying geometry. Moreover, we examine the finetuning process and its effect on the learned representations. Based on that, we devise an unfreezing strategy which consistently outperforms our baseline without introducing any other modifications to the model or the training pipeline, and achieve state-of-the-art results in the classification task among transformer models.
    Bring Your Own Data! Self-Supervised Evaluation for Large Language Models. (arXiv:2306.13651v1 [cs.CL])
    With the rise of Large Language Models (LLMs) and their ubiquitous deployment in diverse domains, measuring language model behavior on realistic data is imperative. For example, a company deploying a client-facing chatbot must ensure that the model will not respond to client requests with profanity. Current evaluations approach this problem using small, domain-specific datasets with human-curated labels. These evaluation sets are often sampled from a narrow and simplified distribution, and data sources can unknowingly be leaked into the training set which can lead to misleading evaluations. To bypass these drawbacks, we propose a framework for self-supervised evaluation of LLMs by analyzing their sensitivity or invariance to transformations on the input text. Self-supervised evaluation can directly monitor LLM behavior on datasets collected in the wild or streamed during live model deployment. We demonstrate self-supervised evaluation strategies for measuring closed-book knowledge, toxicity, and long-range context dependence, in addition to sensitivity to grammatical structure and tokenization errors. When comparisons to similar human-labeled benchmarks are available, we find strong correlations between self-supervised and human-supervised evaluations. The self-supervised paradigm complements current evaluation strategies that rely on labeled data.
    Transformed Distribution Matching for Missing Value Imputation. (arXiv:2302.10363v2 [cs.LG] UPDATED)
    We study the problem of imputing missing values in a dataset, which has important applications in many domains. The key to missing value imputation is to capture the data distribution with incomplete samples and impute the missing values accordingly. In this paper, by leveraging the fact that any two batches of data with missing values come from the same data distribution, we propose to impute the missing values of two batches of samples by transforming them into a latent space through deep invertible functions and matching them distributionally. To learn the transformations and impute the missing values simultaneously, a simple and well-motivated algorithm is proposed. Our algorithm has fewer hyperparameters to fine-tune and generates high-quality imputations regardless of how missing values are generated. Extensive experiments over a large number of datasets and competing benchmark algorithms show that our method achieves state-of-the-art performance.
    Improving Signed Propagation for Graph Neural Networks. (arXiv:2301.08918v4 [cs.LG] UPDATED)
    Message-passing Graph Neural Networks (GNNs), which collect information from adjacent nodes achieve dismal performance on heterophilic graphs. Various schemes have been proposed to solve this problem, and propagating signed information on heterophilic edges has gained great attention. Recently, some works provided theoretical analysis that signed propagation always leads to performance improvement under a binary class scenario. However, we notice that prior analyses do not align well with multi-class benchmark datasets. This paper provides a new understanding of signed propagation for multi-class scenarios and points out two drawbacks in terms of message-passing and parameter update: (1) Message-passing: if two nodes belong to different classes but have a high similarity, signed propagation can decrease the separability. (2) Parameter update: the prediction uncertainty (e.g., conflict evidence) of signed neighbors increases during training, which can impede the stability of the algorithm. Based on the observation, we introduce two novel strategies for improving signed propagation under multi-class graphs. The proposed scheme combines calibration to secure robustness while reducing uncertainty. We show the efficacy of our theorem through extensive experiments on six benchmark graph datasets.
    Gradient Descent with Linearly Correlated Noise: Theory and Applications to Differential Privacy. (arXiv:2302.01463v2 [cs.LG] UPDATED)
    We study gradient descent under linearly correlated noise. Our work is motivated by recent practical methods for optimization with differential privacy (DP), such as DP-FTRL, which achieve strong performance in settings where privacy amplification techniques are infeasible (such as in federated learning). These methods inject privacy noise through a matrix factorization mechanism, making the noise linearly correlated over iterations. We propose a simplified setting that distills key facets of these methods and isolates the impact of linearly correlated noise. We analyze the behavior of gradient descent in this setting, for both convex and non-convex functions. Our analysis is demonstrably tighter than prior work and recovers multiple important special cases exactly (including anticorrelated perturbed gradient descent). We use our results to develop new, effective matrix factorizations for differentially private optimization, and highlight the benefits of these factorizations theoretically and empirically.
    Scaling MLPs: A Tale of Inductive Bias. (arXiv:2306.13575v1 [cs.LG])
    In this work we revisit the most fundamental building block in deep learning, the multi-layer perceptron (MLP), and study the limits of its performance on vision tasks. Empirical insights into MLPs are important for multiple reasons. (1) Given the recent narrative "less inductive bias is better", popularized due to transformers eclipsing convolutional models, it is natural to explore the limits of this hypothesis. To that end, MLPs offer an ideal test bed, being completely free of any inductive bias. (2) MLPs have almost exclusively been the main protagonist in the deep learning theory literature due to their mathematical simplicity, serving as a proxy to explain empirical phenomena observed for more complex architectures. Surprisingly, experimental datapoints for MLPs are very difficult to find in the literature, especially when coupled with large pre-training protocols. This discrepancy between practice and theory is worrying: Do MLPs reflect the empirical advances exhibited by practical models? Or do theorists need to rethink the role of MLPs as a proxy? We provide insights into both these aspects. We show that the performance of MLPs drastically improves with scale (93% on CIFAR10, 79% on CIFAR100, 69% on TinyImageNet), highlighting that lack of inductive bias can indeed be compensated. We observe that MLPs mimic the behaviour of their modern counterparts faithfully, with some components in the learning setting however surprisingly exhibiting stronger or unexpected behaviours. Due to their inherent computational efficiency, large pre-training experiments become more accessible for academic researchers. All of our experiments were run on a single GPU.
    Penalty Gradient Normalization for Generative Adversarial Networks. (arXiv:2306.13576v1 [cs.CV])
    In this paper, we propose a novel normalization method called penalty gradient normalization (PGN) to tackle the training instability of Generative Adversarial Networks (GANs) caused by the sharp gradient space. Unlike existing work such as gradient penalty and spectral normalization, the proposed PGN only imposes a penalty gradient norm constraint on the discriminator function, which increases the capacity of the discriminator. Moreover, the proposed penalty gradient normalization can be applied to different GAN architectures with little modification. Extensive experiments on three datasets show that GANs trained with penalty gradient normalization outperform existing methods in terms of both Frechet Inception and Distance and Inception Score.
    Linear-scaling kernels for protein sequences and small molecules outperform deep learning while providing uncertainty quantitation and improved interpretability. (arXiv:2302.03294v2 [cs.LG] UPDATED)
    Gaussian process (GP) is a Bayesian model which provides several advantages for regression tasks in machine learning such as reliable quantitation of uncertainty and improved interpretability. Their adoption has been precluded by their excessive computational cost and by the difficulty in adapting them for analyzing sequences (e.g. amino acid and nucleotide sequences) and graphs (e.g. ones representing small molecules). In this study, we develop efficient and scalable approaches for fitting GP models as well as fast convolution kernels which scale linearly with graph or sequence size. We implement these improvements by building an open-source Python library called xGPR. We compare the performance of xGPR with the reported performance of various deep learning models on 20 benchmarks, including small molecule, protein sequence and tabular data. We show that xGRP achieves highly competitive performance with much shorter training time. Furthermore, we also develop new kernels for sequence and graph data and show that xGPR generally outperforms convolutional neural networks on predicting key properties of proteins and small molecules. Importantly, xGPR provides uncertainty information not available from typical deep learning models. Additionally, xGPR provides a representation of the input data that can be used for clustering and data visualization. These results demonstrate that xGPR provides a powerful and generic tool that can be broadly useful in protein engineering and drug discovery.
    Inversion by Direct Iteration: An Alternative to Denoising Diffusion for Image Restoration. (arXiv:2303.11435v3 [eess.IV] UPDATED)
    Inversion by Direct Iteration (InDI) is a new formulation for supervised image restoration that avoids the so-called ``regression to the mean'' effect and produces more realistic and detailed images than existing regression-based methods. It does this by gradually improving image quality in small steps, similar to generative denoising diffusion models. Image restoration is an ill-posed problem where multiple high-quality images are plausible reconstructions of a given low-quality input. Therefore, the outcome of a single step regression model is typically an aggregate of all possible explanations, therefore lacking details and realism. The main advantage of InDI is that it does not try to predict the clean target image in a single step but instead gradually improves the image in small steps, resulting in better perceptual quality. While generative denoising diffusion models also work in small steps, our formulation is distinct in that it does not require knowledge of any analytic form of the degradation process. Instead, we directly learn an iterative restoration process from low-quality and high-quality paired examples. InDI can be applied to virtually any image degradation, given paired training data. In conditional denoising diffusion image restoration the denoising network generates the restored image by repeatedly denoising an initial image of pure noise, conditioned on the degraded input. Contrary to conditional denoising formulations, InDI directly proceeds by iteratively restoring the input low-quality image, producing high-quality results on a variety of image restoration tasks, including motion and out-of-focus deblurring, super-resolution, compression artifact removal, and denoising.
    Instance-Adaptive Video Compression: Improving Neural Codecs by Training on the Test Set. (arXiv:2111.10302v2 [eess.IV] UPDATED)
    We introduce a video compression algorithm based on instance-adaptive learning. On each video sequence to be transmitted, we finetune a pretrained compression model. The optimal parameters are transmitted to the receiver along with the latent code. By entropy-coding the parameter updates under a suitable mixture model prior, we ensure that the network parameters can be encoded efficiently. This instance-adaptive compression algorithm is agnostic about the choice of base model and has the potential to improve any neural video codec. On UVG, HEVC, and Xiph datasets, our codec improves the performance of a scale-space flow model by between 21% and 27% BD-rate savings, and that of a state-of-the-art B-frame model by 17 to 20% BD-rate savings. We also demonstrate that instance-adaptive finetuning improves the robustness to domain shift. Finally, our approach reduces the capacity requirements of compression models. We show that it enables a competitive performance even after reducing the network size by 70%.
    Curvature Filtrations for Graph Generative Model Evaluation. (arXiv:2301.12906v2 [cs.LG] UPDATED)
    Graph generative model evaluation necessitates understanding differences between graphs on the distributional level. This entails being able to harness salient attributes of graphs in an efficient manner. Curvature constitutes one such property of graphs, and has recently started to prove useful in characterising graphs. Its expressive properties, stability, and practical utility in model evaluation remain largely unexplored, however. We combine graph curvature descriptors with emerging methods from topological data analysis to obtain robust, expressive descriptors for evaluating graph generative models.
    Active Coverage for PAC Reinforcement Learning. (arXiv:2306.13601v1 [cs.LG])
    Collecting and leveraging data with good coverage properties plays a crucial role in different aspects of reinforcement learning (RL), including reward-free exploration and offline learning. However, the notion of "good coverage" really depends on the application at hand, as data suitable for one context may not be so for another. In this paper, we formalize the problem of active coverage in episodic Markov decision processes (MDPs), where the goal is to interact with the environment so as to fulfill given sampling requirements. This framework is sufficiently flexible to specify any desired coverage property, making it applicable to any problem that involves online exploration. Our main contribution is an instance-dependent lower bound on the sample complexity of active coverage and a simple game-theoretic algorithm, CovGame, that nearly matches it. We then show that CovGame can be used as a building block to solve different PAC RL tasks. In particular, we obtain a simple algorithm for PAC reward-free exploration with an instance-dependent sample complexity that, in certain MDPs which are "easy to explore", is lower than the minimax one. By further coupling this exploration algorithm with a new technique to do implicit eliminations in policy space, we obtain a computationally-efficient algorithm for best-policy identification whose instance-dependent sample complexity scales with gaps between policy values.
    NetBooster: Empowering Tiny Deep Learning By Standing on the Shoulders of Deep Giants. (arXiv:2306.13586v1 [cs.LG])
    Tiny deep learning has attracted increasing attention driven by the substantial demand for deploying deep learning on numerous intelligent Internet-of-Things devices. However, it is still challenging to unleash tiny deep learning's full potential on both large-scale datasets and downstream tasks due to the under-fitting issues caused by the limited model capacity of tiny neural networks (TNNs). To this end, we propose a framework called NetBooster to empower tiny deep learning by augmenting the architectures of TNNs via an expansion-then-contraction strategy. Extensive experiments show that NetBooster consistently outperforms state-of-the-art tiny deep learning solutions.  ( 2 min )
    Adversarial Resilience in Sequential Prediction via Abstention. (arXiv:2306.13119v1 [cs.LG])
    We study the problem of sequential prediction in the stochastic setting with an adversary that is allowed to inject clean-label adversarial (or out-of-distribution) examples. Algorithms designed to handle purely stochastic data tend to fail in the presence of such adversarial examples, often leading to erroneous predictions. This is undesirable in many high-stakes applications such as medical recommendations, where abstaining from predictions on adversarial examples is preferable to misclassification. On the other hand, assuming fully adversarial data leads to very pessimistic bounds that are often vacuous in practice. To capture this motivation, we propose a new model of sequential prediction that sits between the purely stochastic and fully adversarial settings by allowing the learner to abstain from making a prediction at no cost on adversarial examples. Assuming access to the marginal distribution on the non-adversarial examples, we design a learner whose error scales with the VC dimension (mirroring the stochastic setting) of the hypothesis class, as opposed to the Littlestone dimension which characterizes the fully adversarial setting. Furthermore, we design a learner for VC dimension~1 classes, which works even in the absence of access to the marginal distribution. Our key technical contribution is a novel measure for quantifying uncertainty for learning VC classes, which may be of independent interest.
    Prediction of Annual Snow Accumulation Using a Recurrent Graph Convolutional Approach. (arXiv:2306.13181v1 [cs.LG])
    The precise tracking and prediction of polar ice layers can unveil historic trends in snow accumulation. In recent years, airborne radar sensors, such as the Snow Radar, have been shown to be able to measure these internal ice layers over large areas with a fine vertical resolution. In our previous work, we found that temporal graph convolutional networks perform reasonably well in predicting future snow accumulation when given temporal graphs containing deep ice layer thickness. In this work, we experiment with a graph attention network-based model and used it to predict more annual snow accumulation data points with fewer input data points on a larger dataset. We found that these large changes only very slightly negatively impacted performance.  ( 2 min )
    Margin Maximization in Attention Mechanism. (arXiv:2306.13596v1 [cs.LG])
    Attention mechanism is a central component of the transformer architecture which led to the phenomenal success of large language models. However, the theoretical principles underlying the attention mechanism are poorly understood, especially its nonconvex optimization dynamics. In this work, we explore the seminal softmax-attention model $f(\boldsymbol{X})=\langle \boldsymbol{Xv}, \texttt{softmax}(\boldsymbol{XWp})\rangle$, where, $\boldsymbol{X}$ is the token sequence and $(\boldsymbol{v},\boldsymbol{W},\boldsymbol{p})$ are tunable parameters. We prove that running gradient descent on $\boldsymbol{p}$, or equivalently $\boldsymbol{W}$, converges in direction to a max-margin solution that separates $\textit{locally-optimal}$ tokens from non-optimal ones. This clearly formalizes attention as a token separation mechanism. Remarkably, our results are applicable to general data and precisely characterize $\textit{optimality}$ of tokens in terms of the value embeddings $\boldsymbol{Xv}$ and problem geometry. We also provide a broader regularization path analysis that establishes the margin maximizing nature of attention even for nonlinear prediction heads. When optimizing $\boldsymbol{v}$ and $\boldsymbol{p}$ simultaneously with logistic loss, we identify conditions under which the regularization paths directionally converge to their respective hard-margin SVM solutions where $\boldsymbol{v}$ separates the input features based on their labels. Interestingly, the SVM formulation of $\boldsymbol{p}$ is influenced by the support vector geometry of $\boldsymbol{v}$. Finally, we verify our theoretical findings via numerical experiments and provide insights.
    CLUE: Calibrated Latent Guidance for Offline Reinforcement Learning. (arXiv:2306.13412v1 [cs.LG])
    Offline reinforcement learning (RL) aims to learn an optimal policy from pre-collected and labeled datasets, which eliminates the time-consuming data collection in online RL. However, offline RL still bears a large burden of specifying/handcrafting extrinsic rewards for each transition in the offline data. As a remedy for the labor-intensive labeling, we propose to endow offline RL tasks with a few expert data and utilize the limited expert data to drive intrinsic rewards, thus eliminating the need for extrinsic rewards. To achieve that, we introduce \textbf{C}alibrated \textbf{L}atent g\textbf{U}idanc\textbf{E} (CLUE), which utilizes a conditional variational auto-encoder to learn a latent space such that intrinsic rewards can be directly qualified over the latent space. CLUE's key idea is to align the intrinsic rewards consistent with the expert intention via enforcing the embeddings of expert data to a calibrated contextual representation. We instantiate the expert-driven intrinsic rewards in sparse-reward offline RL tasks, offline imitation learning (IL) tasks, and unsupervised offline RL tasks. Empirically, we find that CLUE can effectively improve the sparse-reward offline RL performance, outperform the state-of-the-art offline IL baselines, and discover diverse skills from static reward-free offline data.
    Training Transformers with 4-bit Integers. (arXiv:2306.11987v2 [cs.LG] UPDATED)
    Quantizing the activation, weight, and gradient to 4-bit is promising to accelerate neural network training. However, existing 4-bit training methods require custom numerical formats which are not supported by contemporary hardware. In this work, we propose a training method for transformers with all matrix multiplications implemented with the INT4 arithmetic. Training with an ultra-low INT4 precision is challenging. To achieve this, we carefully analyze the specific structures of activation and gradients in transformers to propose dedicated quantizers for them. For forward propagation, we identify the challenge of outliers and propose a Hadamard quantizer to suppress the outliers. For backpropagation, we leverage the structural sparsity of gradients by proposing bit splitting and leverage score sampling techniques to quantize gradients accurately. Our algorithm achieves competitive accuracy on a wide range of tasks including natural language understanding, machine translation, and image classification. Unlike previous 4-bit training methods, our algorithm can be implemented on the current generation of GPUs. Our prototypical linear operator implementation is up to 2.2 times faster than the FP16 counterparts and speeds up the training by up to 35.1%.
    Improving Gender Fairness of Pre-Trained Language Models without Catastrophic Forgetting. (arXiv:2110.05367v2 [cs.CL] UPDATED)
    Existing studies addressing gender bias of pre-trained language models, usually build a small gender-neutral data set and conduct a second phase pre-training on the model with such data. However, given the limited size and concentrated focus of the gender-neutral data, catastrophic forgetting would occur during second-phase pre-training. Forgetting information in the original training data may damage the model's downstream performance by a large margin. In this work, we empirically show that catastrophic forgetting occurs in such methods by evaluating them with general NLP tasks in GLUE. Then, we propose a new method, GEnder Equality Prompt (GEEP), to improve gender fairness of pre-trained models with less forgetting. GEEP freezes the pre-trained model and learns gender-related prompts with gender-neutral data. Empirical results show that GEEP not only achieves SOTA performances on gender fairness tasks, but also forgets less and performs better on GLUE by a large margin.  ( 2 min )
    Best Arm Identification in Stochastic Bandits: Beyond $\beta-$optimality. (arXiv:2301.03785v2 [stat.ML] UPDATED)
    This paper investigates a hitherto unaddressed aspect of best arm identification (BAI) in stochastic multi-armed bandits in the fixed-confidence setting. Two key metrics for assessing bandit algorithms are computational efficiency and performance optimality (e.g., in sample complexity). In stochastic BAI literature, there have been advances in designing algorithms to achieve optimal performance, but they are generally computationally expensive to implement (e.g., optimization-based methods). There also exist approaches with high computational efficiency, but they have provable gaps to the optimal performance (e.g., the $\beta$-optimal approaches in top-two methods). This paper introduces a framework and an algorithm for BAI that achieves optimal performance with a computationally efficient set of decision rules. The central process that facilitates this is a routine for sequentially estimating the optimal allocations up to sufficient fidelity. Specifically, these estimates are accurate enough for identifying the best arm (hence, achieving optimality) but not overly accurate to an unnecessary extent that creates excessive computational complexity (hence, maintaining efficiency). Furthermore, the existing relevant literature focuses on the family of exponential distributions. This paper considers a more general setting of any arbitrary family of distributions parameterized by their mean values (under mild regularity conditions). The optimality is established analytically, and numerical evaluations are provided to assess the analytical guarantees and compare the performance with those of the existing ones.  ( 2 min )
    The Challenges of HTR Model Training: Feedback from the Project Donner le gout de l'archive a l'ere numerique. (arXiv:2212.11146v3 [cs.CV] UPDATED)
    The arrival of handwriting recognition technologies offers new possibilities for research in heritage studies. However, it is now necessary to reflect on the experiences and the practices developed by research teams. Our use of the Transkribus platform since 2018 has led us to search for the most significant ways to improve the performance of our handwritten text recognition (HTR) models which are made to transcribe French handwriting dating from the 17th century. This article therefore reports on the impacts of creating transcribing protocols, using the language model at full scale and determining the best way to use base models in order to help increase the performance of HTR models. Combining all of these elements can indeed increase the performance of a single model by more than 20% (reaching a Character Error Rate below 5%). This article also discusses some challenges regarding the collaborative nature of HTR platforms such as Transkribus and the way researchers can share their data generated in the process of creating or training handwritten text recognition models.
    Catching Image Retrieval Generalization. (arXiv:2306.13357v1 [cs.LG])
    The concepts of overfitting and generalization are vital for evaluating machine learning models. In this work, we show that the popular Recall@K metric depends on the number of classes in the dataset, which limits its ability to estimate generalization. To fix this issue, we propose a new metric, which measures retrieval performance, and, unlike Recall@K, estimates generalization. We apply the proposed metric to popular image retrieval methods and provide new insights about deep metric learning generalization.  ( 2 min )
    LEACE: Perfect linear concept erasure in closed form. (arXiv:2306.03819v2 [cs.LG] UPDATED)
    Concept erasure aims to remove specified features from a representation. It can improve fairness (e.g. preventing a classifier from using gender or race) and interpretability (e.g. removing a concept to observe changes in model behavior). We introduce LEAst-squares Concept Erasure (LEACE), a closed-form method which provably prevents all linear classifiers from detecting a concept while changing the representation as little as possible, as measured by a broad class of norms. We apply LEACE to large language models with a novel procedure called "concept scrubbing," which erases target concept information from every layer in the network. We demonstrate our method on two tasks: measuring the reliance of language models on part-of-speech information, and reducing gender bias in BERT embeddings. Code is available at https://github.com/EleutherAI/concept-erasure.
    Binary domain generalization for sparsifying binary neural networks. (arXiv:2306.13515v1 [cs.LG])
    Binary neural networks (BNNs) are an attractive solution for developing and deploying deep neural network (DNN)-based applications in resource constrained devices. Despite their success, BNNs still suffer from a fixed and limited compression factor that may be explained by the fact that existing pruning methods for full-precision DNNs cannot be directly applied to BNNs. In fact, weight pruning of BNNs leads to performance degradation, which suggests that the standard binarization domain of BNNs is not well adapted for the task. This work proposes a novel more general binary domain that extends the standard binary one that is more robust to pruning techniques, thus guaranteeing improved compression and avoiding severe performance losses. We demonstrate a closed-form solution for quantizing the weights of a full-precision network into the proposed binary domain. Finally, we show the flexibility of our method, which can be combined with other pruning strategies. Experiments over CIFAR-10 and CIFAR-100 demonstrate that the novel approach is able to generate efficient sparse networks with reduced memory usage and run-time latency, while maintaining performance.
    Selectively increasing the diversity of GAN-generated samples. (arXiv:2207.01561v3 [cs.CV] UPDATED)
    Generative Adversarial Networks (GANs) are powerful models able to synthesize data samples closely resembling the distribution of real data, yet the diversity of those generated samples is limited due to the so-called mode collapse phenomenon observed in GANs. Especially prone to mode collapse are conditional GANs, which tend to ignore the input noise vector and focus on the conditional information. Recent methods proposed to mitigate this limitation increase the diversity of generated samples, yet they reduce the performance of the models when similarity of samples is required. To address this shortcoming, we propose a novel method to selectively increase the diversity of GAN-generated samples. By adding a simple, yet effective regularization to the training loss function we encourage the generator to discover new data modes for inputs related to diverse outputs while generating consistent samples for the remaining ones. More precisely, we maximise the ratio of distances between generated images and input latent vectors scaling the effect according to the diversity of samples for a given conditional input. We show the superiority of our method in a synthetic benchmark as well as a real-life scenario of simulating data from the Zero Degree Calorimeter of ALICE experiment in LHC, CERN.
    Approximate Causal Effect Identification under Weak Confounding. (arXiv:2306.13242v1 [stat.ML])
    Causal effect estimation has been studied by many researchers when only observational data is available. Sound and complete algorithms have been developed for pointwise estimation of identifiable causal queries. For non-identifiable causal queries, researchers developed polynomial programs to estimate tight bounds on causal effect. However, these are computationally difficult to optimize for variables with large support sizes. In this paper, we analyze the effect of "weak confounding" on causal estimands. More specifically, under the assumption that the unobserved confounders that render a query non-identifiable have small entropy, we propose an efficient linear program to derive the upper and lower bounds of the causal effect. We show that our bounds are consistent in the sense that as the entropy of unobserved confounders goes to zero, the gap between the upper and lower bound vanishes. Finally, we conduct synthetic and real data simulations to compare our bounds with the bounds obtained by the existing work that cannot incorporate such entropy constraints and show that our bounds are tighter for the setting with weak confounders.
    On the Informativeness of Supervision Signals. (arXiv:2211.01407v2 [cs.LG] UPDATED)
    Supervised learning typically focuses on learning transferable representations from training examples annotated by humans. While rich annotations (like soft labels) carry more information than sparse annotations (like hard labels), they are also more expensive to collect. For example, while hard labels only provide information about the closest class an object belongs to (e.g., "this is a dog"), soft labels provide information about the object's relationship with multiple classes (e.g., "this is most likely a dog, but it could also be a wolf or a coyote"). We use information theory to compare how a number of commonly-used supervision signals contribute to representation-learning performance, as well as how their capacity is affected by factors such as the number of labels, classes, dimensions, and noise. Our framework provides theoretical justification for using hard labels in the big-data regime, but richer supervision signals for few-shot learning and out-of-distribution generalization. We validate these results empirically in a series of experiments with over 1 million crowdsourced image annotations and conduct a cost-benefit analysis to establish a tradeoff curve that enables users to optimize the cost of supervising representation learning on their own datasets.
    Compression with Bayesian Implicit Neural Representations. (arXiv:2305.19185v2 [cs.LG] UPDATED)
    Many common types of data can be represented as functions that map coordinates to signal values, such as pixel locations to RGB values in the case of an image. Based on this view, data can be compressed by overfitting a compact neural network to its functional representation and then encoding the network weights. However, most current solutions for this are inefficient, as quantization to low-bit precision substantially degrades the reconstruction quality. To address this issue, we propose overfitting variational Bayesian neural networks to the data and compressing an approximate posterior weight sample using relative entropy coding instead of quantizing and entropy coding it. This strategy enables direct optimization of the rate-distortion performance by minimizing the $\beta$-ELBO, and target different rate-distortion trade-offs for a given network architecture by adjusting $\beta$. Moreover, we introduce an iterative algorithm for learning prior weight distributions and employ a progressive refinement process for the variational posterior that significantly enhances performance. Experiments show that our method achieves strong performance on image and audio compression while retaining simplicity.
    DeepJoin: Joinable Table Discovery with Pre-trained Language Models. (arXiv:2212.07588v2 [cs.DB] UPDATED)
    Due to the usefulness in data enrichment for data analysis tasks, joinable table discovery has become an important operation in data lake management. Existing approaches target equi-joins, the most common way of combining tables for creating a unified view, or semantic joins, which tolerate misspellings and different formats to deliver more join results. They are either exact solutions whose running time is linear in the sizes of query column and target table repository or approximate solutions lacking precision. In this paper, we propose Deepjoin, a deep learning model for accurate and efficient joinable table discovery. Our solution is an embedding-based retrieval, which employs a pre-trained language model (PLM) and is designed as one framework serving both equi- and semantic joins. We propose a set of contextualization options to transform column contents to a text sequence. The PLM reads the sequence and is fine-tuned to embed columns to vectors such that columns are expected to be joinable if they are close to each other in the vector space. Since the output of the PLM is fixed in length, the subsequent search procedure becomes independent of the column size. With a state-of-the-art approximate nearest neighbor search algorithm, the search time is logarithmic in the repository size. To train the model, we devise the techniques for preparing training data as well as data augmentation. The experiments on real datasets demonstrate that by training on a small subset of a corpus, Deepjoin generalizes to large datasets and its precision consistently outperforms other approximate solutions'. Deepjoin is even more accurate than an exact solution to semantic joins when evaluated with labels from experts. Moreover, when equipped with a GPU, Deepjoin is up to two orders of magnitude faster than existing solutions.
    A Survey on Multimodal Large Language Models. (arXiv:2306.13549v1 [cs.CV])
    Multimodal Large Language Model (MLLM) recently has been a new rising research hotspot, which uses powerful Large Language Models (LLMs) as a brain to perform multimodal tasks. The surprising emergent capabilities of MLLM, such as writing stories based on images and OCR-free math reasoning, are rare in traditional methods, suggesting a potential path to artificial general intelligence. In this paper, we aim to trace and summarize the recent progress of MLLM. First of all, we present the formulation of MLLM and delineate its related concepts. Then, we discuss the key techniques and applications, including Multimodal Instruction Tuning (M-IT), Multimodal In-Context Learning (M-ICL), Multimodal Chain of Thought (M-CoT), and LLM-Aided Visual Reasoning (LAVR). Finally, we discuss existing challenges and point out promising research directions. In light of the fact that the era of MLLM has only just begun, we will keep updating this survey and hope it can inspire more research. An associated GitHub link collecting the latest papers is available at https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models.
    Neural Algorithmic Reasoning Without Intermediate Supervision. (arXiv:2306.13411v1 [cs.LG])
    Neural Algorithmic Reasoning is an emerging area of machine learning focusing on building models which can imitate the execution of classic algorithms, such as sorting, shortest paths, etc. One of the main challenges is to learn algorithms that are able to generalize to out-of-distribution data, in particular with significantly larger input sizes. Recent work on this problem has demonstrated the advantages of learning algorithms step-by-step, giving models access to all intermediate steps of the original algorithm. In this work, we instead focus on learning neural algorithmic reasoning only from the input-output pairs without appealing to the intermediate supervision. We propose simple but effective architectural improvements and also build a self-supervised objective that can regularise intermediate computations of the model without access to the algorithm trajectory. We demonstrate that our approach is competitive to its trajectory-supervised counterpart on tasks from the CLRS Algorithmic Reasoning Benchmark and achieves new state-of-the-art results for several problems, including sorting, where we obtain significant improvements. Thus, learning without intermediate supervision is a promising direction for further research on neural reasoners.  ( 2 min )
    DiffInfinite: Large Mask-Image Synthesis via Parallel Random Patch Diffusion in Histopathology. (arXiv:2306.13384v1 [eess.IV])
    We present DiffInfinite, a hierarchical diffusion model that generates arbitrarily large histological images while preserving long-range correlation structural information. Our approach first generates synthetic segmentation masks, subsequently used as conditions for the high-fidelity generative diffusion process. The proposed sampling method can be scaled up to any desired image size while only requiring small patches for fast training. Moreover, it can be parallelized more efficiently than previous large-content generation methods while avoiding tiling artefacts. The training leverages classifier-free guidance to augment a small, sparsely annotated dataset with unlabelled data. Our method alleviates unique challenges in histopathological imaging practice: large-scale information, costly manual annotation, and protective data handling. The biological plausibility of DiffInfinite data is validated in a survey by ten experienced pathologists as well as a downstream segmentation task. Furthermore, the model scores strongly on anti-copying metrics which is beneficial for the protection of patient data.  ( 2 min )
    Pre or Post-Softmax Scores in Gradient-based Attribution Methods, What is Best?. (arXiv:2306.13197v1 [cs.LG])
    Gradient based attribution methods for neural networks working as classifiers use gradients of network scores. Here we discuss the practical differences between using gradients of pre-softmax scores versus post-softmax scores, and their respective advantages and disadvantages.
    Learning rewards for robotic ultrasound scanning using probabilistic temporal ranking. (arXiv:2002.01240v3 [cs.RO] UPDATED)
    Informative path-planning is a well established approach to visual-servoing and active viewpoint selection in robotics, but typically assumes that a suitable cost function or goal state is known. This work considers the inverse problem, where the goal of the task is unknown, and a reward function needs to be inferred from exploratory example demonstrations provided by a demonstrator, for use in a downstream informative path-planning policy. Unfortunately, many existing reward inference strategies are unsuited to this class of problems, due to the exploratory nature of the demonstrations. In this paper, we propose an alternative approach to cope with the class of problems where these sub-optimal, exploratory demonstrations occur. We hypothesise that, in tasks which require discovery, successive states of any demonstration are progressively more likely to be associated with a higher reward, and use this hypothesis to generate time-based binary comparison outcomes and infer reward functions that support these ranks, under a probabilistic generative model. We formalise this \emph{probabilistic temporal ranking} approach and show that it improves upon existing approaches to perform reward inference for autonomous ultrasound scanning, a novel application of learning from demonstration in medical imaging while also being of value across a broad range of goal-oriented learning from demonstration tasks. \keywords{Visual servoing \and reward inference \and probabilistic temporal ranking  ( 3 min )
    SPRT-based Efficient Best Arm Identification in Stochastic Bandits. (arXiv:2207.11158v3 [stat.ML] UPDATED)
    This paper investigates the best arm identification (BAI) problem in stochastic multi-armed bandits in the fixed confidence setting. The general class of the exponential family of bandits is considered. The existing algorithms for the exponential family of bandits face computational challenges. To mitigate these challenges, the BAI problem is viewed and analyzed as a sequential composite hypothesis testing task, and a framework is proposed that adopts the likelihood ratio-based tests known to be effective for sequential testing. Based on this test statistic, a BAI algorithm is designed that leverages the canonical sequential probability ratio tests for arm selection and is amenable to tractable analysis for the exponential family of bandits. This algorithm has two key features: (1) its sample complexity is asymptotically optimal, and (2) it is guaranteed to be $\delta-$PAC. Existing efficient approaches focus on the Gaussian setting and require Thompson sampling for the arm deemed the best and the challenger arm. Additionally, this paper analytically quantifies the computational expense of identifying the challenger in an existing approach. Finally, numerical experiments are provided to support the analysis.
    Trading-off price for data quality to achieve fair online allocation. (arXiv:2306.13440v1 [cs.LG])
    We consider the problem of online allocation subject to a long-term fairness penalty. Contrary to existing works, however, we do not assume that the decision-maker observes the protected attributes -- which is often unrealistic in practice. Instead they can purchase data that help estimate them from sources of different quality; and hence reduce the fairness penalty at some cost. We model this problem as a multi-armed bandit problem where each arm corresponds to the choice of a data source, coupled with the online allocation problem. We propose an algorithm that jointly solves both problems and show that it has a regret bounded by $\mathcal{O}(\sqrt{T})$. A key difficulty is that the rewards received by selecting a source are correlated by the fairness penalty, which leads to a need for randomization (despite a stochastic setting). Our algorithm takes into account contextual information available before the source selection, and can adapt to many different fairness notions. We also show that in some instances, the estimates used can be learned on the fly.  ( 2 min )
    Retrieval of Boost Invariant Symbolic Observables via Feature Importance. (arXiv:2306.13496v1 [physics.comp-ph])
    Deep learning approaches for jet tagging in high-energy physics are characterized as black boxes that process a large amount of information from which it is difficult to extract key distinctive observables. In this proceeding, we present an alternative to deep learning approaches, Boost Invariant Polynomials, which enables direct analysis of simple analytic expressions representing the most important features in a given task. Further, we show how this approach provides an extremely low dimensional classifier with a minimum set of features representing %effective discriminating physically relevant observables and how it consequently speeds up the algorithm execution, with relatively close performance to the algorithm using the full information.  ( 2 min )
    Offline Skill Graph (OSG): A Framework for Learning and Planning using Offline Reinforcement Learning Skills. (arXiv:2306.13630v1 [cs.RO])
    Reinforcement Learning has received wide interest due to its success in competitive games. Yet, its adoption in everyday applications is limited (e.g. industrial, home, healthcare, etc.). In this paper, we address this limitation by presenting a framework for planning over offline skills and solving complex tasks in real-world environments. Our framework is comprised of three modules that together enable the agent to learn from previously collected data and generalize over it to solve long-horizon tasks. We demonstrate our approach by testing it on a robotic arm that is required to solve complex tasks.
    Co-creating a globally interpretable model with human input. (arXiv:2306.13381v1 [cs.HC])
    We consider an aggregated human-AI collaboration aimed at generating a joint interpretable model. The model takes the form of Boolean decision rules, where human input is provided in the form of logical conditions or as partial templates. This focus on the combined construction of a model offers a different perspective on joint decision making. Previous efforts have typically focused on aggregating outcomes rather than decisions logic. We demonstrate the proposed approach through two examples and highlight the usefulness and challenges of the approach.  ( 2 min )
    Correcting discount-factor mismatch in on-policy policy gradient methods. (arXiv:2306.13284v1 [cs.LG])
    The policy gradient theorem gives a convenient form of the policy gradient in terms of three factors: an action value, a gradient of the action likelihood, and a state distribution involving discounting called the \emph{discounted stationary distribution}. But commonly used on-policy methods based on the policy gradient theorem ignores the discount factor in the state distribution, which is technically incorrect and may even cause degenerate learning behavior in some environments. An existing solution corrects this discrepancy by using $\gamma^t$ as a factor in the gradient estimate. However, this solution is not widely adopted and does not work well in tasks where the later states are similar to earlier states. We introduce a novel distribution correction to account for the discounted stationary distribution that can be plugged into many existing gradient estimators. Our correction circumvents the performance degradation associated with the $\gamma^t$ correction with a lower variance. Importantly, compared to the uncorrected estimators, our algorithm provides improved state emphasis to evade suboptimal policies in certain environments and consistently matches or exceeds the original performance on several OpenAI gym and DeepMind suite benchmarks.  ( 2 min )
    Exploring Context Generalizability in Citywide Crowd Mobility Prediction: An Analytic Framework and Benchmark. (arXiv:2106.16046v4 [cs.LG] UPDATED)
    Contextual features are important data sources for building citywide crowd mobility prediction models. However, the difficulty of applying context lies in the unknown generalizability of contextual features (e.g., weather, holiday, and points of interests) and context modeling techniques across different scenarios. In this paper, we present a unified analytic framework and a large-scale benchmark for evaluating context generalizability. The benchmark includes crowd mobility data, contextual data, and advanced prediction models. We conduct comprehensive experiments in several crowd mobility prediction tasks such as bike flow, metro passenger flow, and electric vehicle charging demand. Our results reveal several important observations: (1) Using more contextual features may not always result in better prediction with existing context modeling techniques; in particular, the combination of holiday and temporal position can provide more generalizable beneficial information than other contextual feature combinations. (2) In context modeling techniques, using a gated unit to incorporate raw contextual features into the deep prediction model has good generalizability. Besides, we offer several suggestions about incorporating contextual factors for building crowd mobility prediction applications. From our findings, we call for future research efforts devoted to developing new context modeling solutions.  ( 3 min )
    STEEL: Singularity-aware Reinforcement Learning. (arXiv:2301.13152v4 [stat.ML] UPDATED)
    Batch reinforcement learning (RL) aims at leveraging pre-collected data to find an optimal policy that maximizes the expected total rewards in a dynamic environment. Nearly all existing algorithms rely on the absolutely continuous assumption on the distribution induced by target policies with respect to the data distribution, so that the batch data can be used to calibrate target policies via the change of measure. However, the absolute continuity assumption could be violated in practice (e.g., no-overlap support), especially when the state-action space is large or continuous. In this paper, we propose a new batch RL algorithm without requiring absolute continuity in the setting of an infinite-horizon Markov decision process with continuous states and actions. We call our algorithm STEEL: SingulariTy-awarE rEinforcement Learning. Our algorithm is motivated by a new error analysis on off-policy evaluation, where we use maximum mean discrepancy, together with distributionally robust optimization, to characterize the error of off-policy evaluation caused by the possible singularity and to enable model extrapolation. By leveraging the idea of pessimism and under some mild conditions, we derive a finite-sample regret guarantee for our proposed algorithm without imposing absolute continuity. Compared with existing algorithms, by requiring only minimal data-coverage assumption, STEEL significantly improves the applicability and robustness of batch RL. Extensive simulation studies and one real experiment on personalized pricing demonstrate the superior performance of our method in dealing with possible singularity in batch RL.
    Solving Coupled Differential Equation Groups Using PINO-CDE. (arXiv:2210.00222v2 [cs.LG] UPDATED)
    As a fundamental mathmatical tool in many engineering disciplines, coupled differential equation groups are being widely used to model complex structures containing multiple physical quantities. Engineers constantly adjust structural parameters at the design stage, which requires a highly efficient solver. The rise of deep learning technologies has offered new perspectives on this task. Unfortunately, existing black-box models suffer from poor accuracy and robustness, while the advanced methodologies of single-output operator regression cannot deal with multiple quantities simultaneously. To address these challenges, we propose PINO-CDE, a deep learning framework for solving coupled differential equation groups (CDEs) along with an equation normalization algorithm for performance enhancing. Based on the theory of physics-informed neural operator (PINO), PINO-CDE uses a single network for all quantities in a CDEs, instead of training dozens, or even hundreds of networks as in the existing literature. We demonstrate the flexibility and feasibility of PINO-CDE for one toy example and two engineering applications: vehicle-track coupled dynamics (VTCD) and reliability assessment for a four-storey building (uncertainty propagation). The performance of VTCD indicates that PINO-CDE outperforms existing software and deep learning-based methods in terms of efficiency and precision, respectively. For the uncertainty propagation task, PINO-CDE provides higher-resolution results in less than a quarter of the time incurred when using the probability density evolution method (PDEM). This framework integrates engineering dynamics and deep learning technologies and may reveal a new concept for CDEs solving and uncertainty propagation.
    From Contextual Data to Newsvendor Decisions: On the Actual Performance of Data-Driven Algorithms. (arXiv:2302.08424v2 [cs.LG] UPDATED)
    In this work, we explore a framework for contextual decision-making to study how the relevance and quantity of past data affects the performance of a data-driven policy. We analyze a contextual Newsvendor problem in which a decision-maker needs to trade-off between an underage and an overage cost in the face of uncertain demand. We consider a setting in which past demands observed under ``close by'' contexts come from close by distributions and analyze the performance of data-driven algorithms through a notion of context-dependent worst-case expected regret. We analyze the broad class of Weighted Empirical Risk Minimization (WERM) policies which weigh past data according to their similarity in the contextual space. This class includes classical policies such as ERM, k-Nearest Neighbors and kernel-based policies. Our main methodological contribution is to characterize exactly the worst-case regret of any WERM policy on any given configuration of contexts. To the best of our knowledge, this provides the first understanding of tight performance guarantees in any contextual decision-making problem, with past literature focusing on upper bounds via concentration inequalities. We instead take an optimization approach, and isolate a structure in the Newsvendor loss function that allows to reduce the infinite-dimensional optimization problem over worst-case distributions to a simple line search. This in turn allows us to unveil fundamental insights that were obfuscated by previous general-purpose bounds. We characterize actual guaranteed performance as a function of the contexts, as well as granular insights on the learning curve of algorithms.
    ovla: Neural Network Ownership Verification using Latent Watermarks. (arXiv:2306.13215v1 [cs.CR])
    Ownership verification for neural networks is important for protecting these models from illegal copying, free-riding, re-distribution and other intellectual property misuse. We present a novel methodology for neural network ownership verification based on the notion of latent watermarks. Existing ownership verification methods either modify or introduce constraints to the neural network parameters, which are accessible to an attacker in a white-box attack and can be harmful to the network's normal operation, or train the network to respond to specific watermarks in the inputs similar to data poisoning-based backdoor attacks, which are susceptible to backdoor removal techniques. In this paper, we address these problems by decoupling a network's normal operation from its responses to watermarked inputs during ownership verification. The key idea is to train the network such that the watermarks remain dormant unless the owner's secret key is applied to activate it. The secret key is realized as a specific perturbation only known to the owner to the network's parameters. We show that our approach offers strong defense against backdoor detection, backdoor removal and surrogate model attacks.In addition, our method provides protection against ambiguity attacks where the attacker either tries to guess the secret weight key or uses fine-tuning to embed their own watermarks with a different key into a pre-trained neural network. Experimental results demonstrate the advantages and effectiveness of our proposed approach.
    Prediction under Latent Subgroup Shifts with High-Dimensional Observations. (arXiv:2306.13472v1 [stat.ML])
    We introduce a new approach to prediction in graphical models with latent-shift adaptation, i.e., where source and target environments differ in the distribution of an unobserved confounding latent variable. Previous work has shown that as long as "concept" and "proxy" variables with appropriate dependence are observed in the source environment, the latent-associated distributional changes can be identified, and target predictions adapted accurately. However, practical estimation methods do not scale well when the observations are complex and high-dimensional, even if the confounding latent is categorical. Here we build upon a recently proposed probabilistic unsupervised learning framework, the recognition-parametrised model (RPM), to recover low-dimensional, discrete latents from image observations. Applied to the problem of latent shifts, our novel form of RPM identifies causal latent structure in the source environment, and adapts properly to predict in the target. We demonstrate results in settings where predictor and proxy are high-dimensional images, a context to which previous methods fail to scale.  ( 2 min )
    Understanding quantum machine learning also requires rethinking generalization. (arXiv:2306.13461v1 [quant-ph])
    Quantum machine learning models have shown successful generalization performance even when trained with few data. In this work, through systematic randomization experiments, we show that traditional approaches to understanding generalization fail to explain the behavior of such quantum models. Our experiments reveal that state-of-the-art quantum neural networks accurately fit random states and random labeling of training data. This ability to memorize random data defies current notions of small generalization error, problematizing approaches that build on complexity measures such as the VC dimension, the Rademacher complexity, and all their uniform relatives. We complement our empirical results with a theoretical construction showing that quantum neural networks can fit arbitrary labels to quantum states, hinting at their memorization ability. Our results do not preclude the possibility of good generalization with few training data but rather rule out any possible guarantees based only on the properties of the model family. These findings expose a fundamental challenge in the conventional understanding of generalization in quantum machine learning and highlight the need for a paradigm shift in the design of quantum models for machine learning tasks.  ( 2 min )
    Network Slicing via Transfer Learning aided Distributed Deep Reinforcement Learning. (arXiv:2301.03262v2 [cs.NI] UPDATED)
    Deep reinforcement learning (DRL) has been increasingly employed to handle the dynamic and complex resource management in network slicing. The deployment of DRL policies in real networks, however, is complicated by heterogeneous cell conditions. In this paper, we propose a novel transfer learning (TL) aided multi-agent deep reinforcement learning (MADRL) approach with inter-agent similarity analysis for inter-cell inter-slice resource partitioning. First, we design a coordinated MADRL method with information sharing to intelligently partition resource to slices and manage inter-cell interference. Second, we propose an integrated TL method to transfer the learned DRL policies among different local agents for accelerating the policy deployment. The method is composed of a new domain and task similarity measurement approach and a new knowledge transfer approach, which resolves the problem of from whom to transfer and how to transfer. We evaluated the proposed solution with extensive simulations in a system-level simulator and show that our approach outperforms the state-of-the-art solutions in terms of performance, convergence speed and sample efficiency. Moreover, by applying TL, we achieve an additional gain over 27% higher than the coordinate MADRL approach without TL.
    Breaking the Deadly Triad with a Target Network. (arXiv:2101.08862v9 [cs.LG] UPDATED)
    The deadly triad refers to the instability of a reinforcement learning algorithm when it employs off-policy learning, function approximation, and bootstrapping simultaneously. In this paper, we investigate the target network as a tool for breaking the deadly triad, providing theoretical support for the conventional wisdom that a target network stabilizes training. We first propose and analyze a novel target network update rule which augments the commonly used Polyak-averaging style update with two projections. We then apply the target network and ridge regularization in several divergent algorithms and show their convergence to regularized TD fixed points. Those algorithms are off-policy with linear function approximation and bootstrapping, spanning both policy evaluation and control, as well as both discounted and average-reward settings. In particular, we provide the first convergent linear $Q$-learning algorithms under nonrestrictive and changing behavior policies without bi-level optimization.
    Self-Supervised Time-to-Event Modeling with Structured Medical Records. (arXiv:2301.03150v2 [cs.LG] UPDATED)
    Time-to-event (TTE) models are used in medicine and other fields for estimating the probability distribution of the time until a specific event occurs. TTE models provide many advantages over classification using fixed time horizons, including naturally handling censored observations, but require more parameters and are challenging to train in settings with limited labeled data. Existing approaches, e.g. proportional hazards or accelerated failure time, employ distributional assumptions to reduce parameters but are vulnerable to model misspecification. In this work, we address these challenges with MOTOR (Many Outcome Time Oriented Representations), a self-supervised model that leverages temporal structure found in collections of timestamped events in electronic health records (EHR) and health insurance claims. MOTOR uses a TTE pretraining objective that predicts the probability distribution of times when events occur, making it well-suited to transfer learning for medical prediction tasks. Having pretrained on EHR and claims data of up to 55M patient records (9B clinical events), we evaluate performance after finetuning for 19 tasks across two datasets. Task-specific models built using MOTOR improve time-dependent C statistics by 4.6% over state-of-the-art while greatly improving sample efficiency, achieving comparable performance to existing methods using only 5% of available task data.  ( 2 min )
    On Sensitivity and Robustness of Normalization Schemes to Input Distribution Shifts in Automatic MR Image Diagnosis. (arXiv:2306.13276v1 [eess.IV])
    Magnetic Resonance Imaging (MRI) is considered the gold standard of medical imaging because of the excellent soft-tissue contrast exhibited in the images reconstructed by the MRI pipeline, which in-turn enables the human radiologist to discern many pathologies easily. More recently, Deep Learning (DL) models have also achieved state-of-the-art performance in diagnosing multiple diseases using these reconstructed images as input. However, the image reconstruction process within the MRI pipeline, which requires the use of complex hardware and adjustment of a large number of scanner parameters, is highly susceptible to noise of various forms, resulting in arbitrary artifacts within the images. Furthermore, the noise distribution is not stationary and varies within a machine, across machines, and patients, leading to varying artifacts within the images. Unfortunately, DL models are quite sensitive to these varying artifacts as it leads to changes in the input data distribution between the training and testing phases. The lack of robustness of these models against varying artifacts impedes their use in medical applications where safety is critical. In this work, we focus on improving the generalization performance of these models in the presence of multiple varying artifacts that manifest due to the complexity of the MR data acquisition. In our experiments, we observe that Batch Normalization, a widely used technique during the training of DL models for medical image analysis, is a significant cause of performance degradation in these changing environments. As a solution, we propose to use other normalization techniques, such as Group Normalization and Layer Normalization (LN), to inject robustness into model performance against varying image artifacts. Through a systematic set of experiments, we show that GN and LN provide better accuracy for various MR artifacts and distribution shifts.  ( 3 min )
    DVFO: Learning-Based DVFS for Energy-Efficient Edge-Cloud Collaborative Inference. (arXiv:2306.01811v3 [cs.LG] UPDATED)
    Due to limited resources on edge and different characteristics of deep neural network (DNN) models, it is a big challenge to optimize DNN inference performance in terms of energy consumption and end-to-end latency on edge devices. In addition to the dynamic voltage frequency scaling (DVFS) technique, the edge-cloud architecture provides a collaborative approach for efficient DNN inference. However, current edge-cloud collaborative inference methods have not optimized various compute resources on edge devices. Thus, we propose DVFO, a novel DVFS-enabled edge-cloud collaborative inference framework, which co-optimizes DVFS and offloading parameters via deep reinforcement learning (DRL). Specifically, DVFO automatically co-optimizes 1) the CPU, GPU and memory frequencies of edge devices, and 2) the feature maps to be offloaded to cloud servers. In addition, it leverages a thinking-while-moving concurrent mechanism to accelerate the DRL learning process, and a spatial-channel attention mechanism to extract DNN feature maps of secondary importance for workload offloading. This approach improves inference performance for different DNN models under various edge-cloud network conditions. Extensive evaluations using two datasets and six widely-deployed DNN models on three heterogeneous edge devices show that DVFO significantly reduces the energy consumption by 33% on average, compared to state-of-the-art schemes. Moreover, DVFO achieves up to 28.6%-59.1% end-to-end latency reduction, while maintaining accuracy within 1% loss on average.  ( 3 min )
    A Machine Learning Pressure Emulator for Hydrogen Embrittlement. (arXiv:2306.13116v1 [cs.LG])
    A recent alternative for hydrogen transportation as a mixture with natural gas is blending it into natural gas pipelines. However, hydrogen embrittlement of material is a major concern for scientists and gas installation designers to avoid process failures. In this paper, we propose a physics-informed machine learning model to predict the gas pressure on the pipes' inner wall. Despite its high-fidelity results, the current PDE-based simulators are time- and computationally-demanding. Using simulation data, we train an ML model to predict the pressure on the pipelines' inner walls, which is a first step for pipeline system surveillance. We found that the physics-based method outperformed the purely data-driven method and satisfy the physical constraints of the gas flow system.  ( 2 min )
    Stochastic Gradient Descent under Markovian Sampling Schemes. (arXiv:2302.14428v3 [math.OC] UPDATED)
    We study a variation of vanilla stochastic gradient descent where the optimizer only has access to a Markovian sampling scheme. These schemes encompass applications that range from decentralized optimization with a random walker (token algorithms), to RL and online system identification problems. We focus on obtaining rates of convergence under the least restrictive assumptions possible on the underlying Markov chain and on the functions optimized. We first unveil the theoretical lower bound for methods that sample stochastic gradients along the path of a Markov chain, making appear a dependency in the hitting time of the underlying Markov chain. We then study Markov chain SGD (MC-SGD) under much milder regularity assumptions than prior works (e.g., no bounded gradients or domain, and infinite state spaces). We finally introduce MC-SAG, an alternative to MC-SGD with variance reduction, that only depends on the hitting time of the Markov chain, therefore obtaining a communication-efficient token algorithm.
    Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation. (arXiv:2211.04772v3 [cs.SD] UPDATED)
    Audio Spectrogram Transformer models rule the field of Audio Tagging, outrunning previously dominating Convolutional Neural Networks (CNNs). Their superiority is based on the ability to scale up and exploit large-scale datasets such as AudioSet. However, Transformers are demanding in terms of model size and computational requirements compared to CNNs. We propose a training procedure for efficient CNNs based on offline Knowledge Distillation (KD) from high-performing yet complex transformers. The proposed training schema and the efficient CNN design based on MobileNetV3 results in models outperforming previous solutions in terms of parameter and computational efficiency and prediction performance. We provide models of different complexity levels, scaling from low-complexity models up to a new state-of-the-art performance of .483 mAP on AudioSet. Source Code available at: https://github.com/fschmid56/EfficientAT
    MimiC: Combating Client Dropouts in Federated Learning by Mimicking Central Updates. (arXiv:2306.12212v2 [cs.LG] UPDATED)
    Federated learning (FL) is a promising framework for privacy-preserving collaborative learning, where model training tasks are distributed to clients and only the model updates need to be collected at a server. However, when being deployed at mobile edge networks, clients may have unpredictable availability and drop out of the training process, which hinders the convergence of FL. This paper tackles such a critical challenge. Specifically, we first investigate the convergence of the classical FedAvg algorithm with arbitrary client dropouts. We find that with the common choice of a decaying learning rate, FedAvg oscillates around a stationary point of the global loss function, which is caused by the divergence between the aggregated and desired central update. Motivated by this new observation, we then design a novel training algorithm named MimiC, where the server modifies each received model update based on the previous ones. The proposed modification of the received model updates mimics the imaginary central update irrespective of dropout clients. The theoretical analysis of MimiC shows that divergence between the aggregated and central update diminishes with proper learning rates, leading to its convergence. Simulation results further demonstrate that MimiC maintains stable convergence performance and learns better models than the baseline methods.
    Necessary and sufficient graphical conditions for optimal adjustment sets in causal graphical models with hidden variables. (arXiv:2102.10324v4 [cs.LG] UPDATED)
    The problem of selecting optimal backdoor adjustment sets to estimate causal effects in graphical models with hidden and conditioned variables is addressed. Previous work has defined optimality as achieving the smallest asymptotic estimation variance and derived an optimal set for the case without hidden variables. For the case with hidden variables there can be settings where no optimal set exists and currently only a sufficient graphical optimality criterion of limited applicability has been derived. In the present work optimality is characterized as maximizing a certain adjustment information which allows to derive a necessary and sufficient graphical criterion for the existence of an optimal adjustment set and a definition and algorithm to construct it. Further, the optimal set is valid if and only if a valid adjustment set exists and has higher (or equal) adjustment information than the Adjust-set proposed in Perkovi{\'c} et al. [Journal of Machine Learning Research, 18: 1--62, 2018] for any graph. The results translate to minimal asymptotic estimation variance for a class of estimators whose asymptotic variance follows a certain information-theoretic relation. Numerical experiments indicate that the asymptotic results also hold for relatively small sample sizes and that the optimal adjustment set or minimized variants thereof often yield better variance also beyond that estimator class. Surprisingly, among the randomly created setups more than 90\% fulfill the optimality conditions indicating that also in many real-world scenarios graphical optimality may hold. Code is available as part of the python package \url{https://github.com/jakobrunge/tigramite}.
    Autoencoders for Real-Time SUEP Detection. (arXiv:2306.13595v1 [hep-ex])
    Confining dark sectors with pseudo-conformal dynamics can produce Soft Unclustered Energy Patterns, or SUEPs, at the Large Hadron Collider: the production of dark quarks in proton-proton collisions leading to a dark shower and the high-multiplicity production of dark hadrons. The final experimental signature is spherically-symmetric energy deposits by an anomalously large number of soft Standard Model particles with a transverse energy of a few hundred MeV. The dominant background for the SUEP search, if it gets produced via gluon-gluon fusion, is multi-jet QCD events. We have developed a deep learning-based Anomaly Detection technique to reject QCD jets and identify any anomalous signature, including SUEP, in real-time in the High-Level Trigger system of the Compact Muon Solenoid experiment at the Large Hadron Collider. A deep convolutional neural autoencoder network has been trained using QCD events by taking transverse energy deposits in the inner tracker, electromagnetic calorimeter, and hadron calorimeter sub-detectors as 3-channel image data. To tackle the biggest challenge of the task, due to the sparse nature of the data: only ~0.5% of the total ~300 k image pixels have non-zero values, a non-standard loss function, the inverse of the so-called Dice Loss, has been exploited. The trained autoencoder with learned spatial features of QCD jets can detect 40% of the SUEP events, with a QCD event mistagging rate as low as 2%. The model inference time has been measured using the Intel CoreTM i5-9600KF processor and found to be ~20 ms, which perfectly satisfies the High-Level Trigger system's latency of O(100) ms. Given the virtue of the unsupervised learning of the autoencoders, the trained model can be applied to any new physics model that predicts an experimental signature anomalous to QCD jets.
    Efficient Model Selection for Predictive Pattern Mining Model by Safe Pattern Pruning. (arXiv:2306.13561v1 [stat.ML])
    Predictive pattern mining is an approach used to construct prediction models when the input is represented by structured data, such as sets, graphs, and sequences. The main idea behind predictive pattern mining is to build a prediction model by considering substructures, such as subsets, subgraphs, and subsequences (referred to as patterns), present in the structured data as features of the model. The primary challenge in predictive pattern mining lies in the exponential growth of the number of patterns with the complexity of the structured data. In this study, we propose the Safe Pattern Pruning (SPP) method to address the explosion of pattern numbers in predictive pattern mining. We also discuss how it can be effectively employed throughout the entire model building process in practical data analysis. To demonstrate the effectiveness of the proposed method, we conduct numerical experiments on regression and classification problems involving sets, graphs, and sequences.
    Cascade Subspace Clustering for Outlier Detection. (arXiv:2306.13500v1 [cs.CV])
    Many methods based on sparse and low-rank representation been developed along with guarantees of correct outlier detection. Self-representation states that a point in a subspace can always be expressed as a linear combination of other points in the subspace. A suitable Markov Chain can be defined on the self-representation and it allows us to recognize the difference between inliers and outliers. However, the reconstruction error of self-representation that is still informative to detect outlier detection, is neglected.Inspired by the gradient boosting, in this paper, we propose a new outlier detection framework that combines a series of weak "outlier detectors" into a single strong one in an iterative fashion by constructing multi-pass self-representation. At each stage, we construct a self-representation based on elastic-net and define a suitable Markov Chain on it to detect outliers. The residual of the self-representation is used for the next stage to learn the next weaker outlier detector. Such a stage will repeat many times. And the final decision of outliers is generated by the previous all results. Experimental results on image and speaker datasets demonstrate its superiority with respect to state-of-the-art sparse and low-rank outlier detection methods.
    On the Convergence Rate of Gaussianization with Random Rotations. (arXiv:2306.13520v1 [cs.LG])
    Gaussianization is a simple generative model that can be trained without backpropagation. It has shown compelling performance on low dimensional data. As the dimension increases, however, it has been observed that the convergence speed slows down. We show analytically that the number of required layers scales linearly with the dimension for Gaussian input. We argue that this is because the model is unable to capture dependencies between dimensions. Empirically, we find the same linear increase in cost for arbitrary input $p(x)$, but observe favorable scaling for some distributions. We explore potential speed-ups and formulate challenges for further research.
    Manifold Contrastive Learning with Variational Lie Group Operators. (arXiv:2306.13544v1 [cs.LG])
    Self-supervised learning of deep neural networks has become a prevalent paradigm for learning representations that transfer to a variety of downstream tasks. Similar to proposed models of the ventral stream of biological vision, it is observed that these networks lead to a separation of category manifolds in the representations of the penultimate layer. Although this observation matches the manifold hypothesis of representation learning, current self-supervised approaches are limited in their ability to explicitly model this manifold. Indeed, current approaches often only apply augmentations from a pre-specified set of "positive pairs" during learning. In this work, we propose a contrastive learning approach that directly models the latent manifold using Lie group operators parameterized by coefficients with a sparsity-promoting prior. A variational distribution over these coefficients provides a generative model of the manifold, with samples which provide feature augmentations applicable both during contrastive training and downstream tasks. Additionally, learned coefficient distributions provide a quantification of which transformations are most likely at each point on the manifold while preserving identity. We demonstrate benefits in self-supervised benchmarks for image datasets, as well as a downstream semi-supervised task. In the former case, we demonstrate that the proposed methods can effectively apply manifold feature augmentations and improve learning both with and without a projection head. In the latter case, we demonstrate that feature augmentations sampled from learned Lie group operators can improve classification performance when using few labels.  ( 2 min )
    TACO: Temporal Latent Action-Driven Contrastive Loss for Visual Reinforcement Learning. (arXiv:2306.13229v1 [cs.LG])
    Despite recent progress in reinforcement learning (RL) from raw pixel data, sample inefficiency continues to present a substantial obstacle. Prior works have attempted to address this challenge by creating self-supervised auxiliary tasks, aiming to enrich the agent's learned representations with control-relevant information for future state prediction. However, these objectives are often insufficient to learn representations that can represent the optimal policy or value function, and they often consider tasks with small, abstract discrete action spaces and thus overlook the importance of action representation learning in continuous control. In this paper, we introduce TACO: Temporal Action-driven Contrastive Learning, a simple yet powerful temporal contrastive learning approach that facilitates the concurrent acquisition of latent state and action representations for agents. TACO simultaneously learns a state and an action representation by optimizing the mutual information between representations of current states paired with action sequences and representations of the corresponding future states. Theoretically, TACO can be shown to learn state and action representations that encompass sufficient information for control, thereby improving sample efficiency. For online RL, TACO achieves 40% performance boost after one million environment interaction steps on average across nine challenging visual continuous control tasks from Deepmind Control Suite. In addition, we show that TACO can also serve as a plug-and-play module adding to existing offline visual RL methods to establish the new state-of-the-art performance for offline visual RL across offline datasets with varying quality.  ( 3 min )
    Recurrent Graph Convolutional Networks for Spatiotemporal Prediction of Snow Accumulation Using Airborne Radar. (arXiv:2302.00817v2 [cs.LG] UPDATED)
    The accurate prediction and estimation of annual snow accumulation has grown in importance as we deal with the effects of climate change and the increase of global atmospheric temperatures. Airborne radar sensors, such as the Snow Radar, are able to measure accumulation rate patterns at a large-scale and monitor the effects of ongoing climate change on Greenland's precipitation and run-off. The Snow Radar's use of an ultra-wide bandwidth enables a fine vertical resolution that helps in capturing internal ice layers. Given the amount of snow accumulation in previous years using the radar data, in this paper, we propose a machine learning model based on recurrent graph convolutional networks to predict the snow accumulation in recent consecutive years at a certain location. We found that the model performs better and with more consistency than equivalent nongeometric and nontemporal models.  ( 2 min )
    A Weighted Autoencoder-Based Approach to Downlink NOMA Constellation Design. (arXiv:2306.13423v1 [eess.SP])
    End-to-end design of communication systems using deep autoencoders (AEs) is gaining attention due to its flexibility and excellent performance. Besides single-user transmission, AE-based design is recently explored in multi-user setup, e.g., for designing constellations for non-orthogonal multiple access (NOMA). In this paper, we further advance the design of AE-based downlink NOMA by introducing weighted loss function in the AE training. By changing the weight coefficients, one can flexibly tune the constellation design to balance error probability of different users, without relying on explicit information about their channel quality. Combined with the SICNet decoder, we demonstrate a significant improvement in achievable levels and flexible control of error probability of different users using the proposed weighted AE-based framework.  ( 2 min )
    SC-Block: Supervised Contrastive Blocking within Entity Resolution Pipelines. (arXiv:2303.03132v2 [cs.DB] UPDATED)
    The goal of entity resolution is to identify records in multiple datasets that represent the same real-world entity. However, comparing all records across datasets can be computationally intensive, leading to long runtimes. To reduce these runtimes, entity resolution pipelines are constructed of two parts: a blocker that applies a computationally cheap method to select candidate record pairs, and a matcher that afterwards identifies matching pairs from this set using more expensive methods. This paper presents SC-Block, a blocking method that utilizes supervised contrastive learning for positioning records in the embedding space, and nearest neighbour search for candidate set building. We benchmark SC-Block against eight state-of-the-art blocking methods. In order to relate the training time of SC-Block to the reduction of the overall runtime of the entity resolution pipeline, we combine SC-Block with four matching methods into complete pipelines. For measuring the overall runtime, we determine candidate sets with 99.5% pair completeness and pass them to the matcher. The results show that SC-Block is able to create smaller candidate sets and pipelines with SC-Block execute 1.5 to 2 times faster compared to pipelines with other blockers, without sacrificing F1 score. Blockers are often evaluated using relatively small datasets which might lead to runtime effects resulting from a large vocabulary size being overlooked. In order to measure runtimes in a more challenging setting, we introduce a new benchmark dataset that requires large numbers of product offers to be blocked. On this large-scale benchmark dataset, pipelines utilizing SC-Block and the best-performing matcher execute 8 times faster than pipelines utilizing another blocker with the same matcher reducing the runtime from 2.5 hours to 18 minutes, clearly compensating for the 5 minutes required for training SC-Block.  ( 3 min )
    Enhanced Dengue Outbreak Prediction in Tamilnadu using Meteorological and Entomological data. (arXiv:2306.13456v1 [cs.LG])
    This paper focuses on studying the impact of climate data and vector larval indices on dengue outbreak. After a comparative study of the various LSTM models, Bidirectional Stacked LSTM network is selected to analyze the time series climate data and health data collected for the state of Tamil Nadu (India), for the period 2014 to 2020. Prediction accuracy of the model is significantly improved by including the mosquito larval index, an indication of VBD control measure.  ( 2 min )
    Single-Leg Revenue Management with Advice. (arXiv:2202.10939v3 [cs.GT] UPDATED)
    Single-leg revenue management is a foundational problem of revenue management that has been particularly impactful in the airline and hotel industry: Given $n$ units of a resource, e.g. flight seats, and a stream of sequentially-arriving customers segmented by fares, what is the optimal online policy for allocating the resource. Previous work focused on designing algorithms when forecasts are available, which are not robust to inaccuracies in the forecast, or online algorithms with worst-case performance guarantees, which can be too conservative in practice. In this work, we look at the single-leg revenue management problem through the lens of the algorithms-with-advice framework, which attempts to harness the increasing prediction accuracy of machine learning methods by optimally incorporating advice about the future into online algorithms. In particular, we characterize the Pareto frontier that captures the tradeoff between consistency (performance when advice is accurate) and competitiveness (performance when advice is inaccurate) for every advice. Moreover, we provide an online algorithm that always achieves performance on this Pareto frontier. We also study the class of protection level policies, which is the most widely-deployed technique for single-leg revenue management: we provide an algorithm to incorporate advice into protection levels that optimally trades off consistency and competitiveness. Moreover, we empirically evaluate the performance of these algorithms on synthetic data. We find that our algorithm for protection level policies performs remarkably well on most instances, even if it is not guaranteed to be on the Pareto frontier in theory. Our results extend to other unit-cost online allocations problems such as the display advertising and the multiple secretary problem together with more general variable-cost problems such as the online knapsack problem.  ( 3 min )
    Directional diffusion models for graph representation learning. (arXiv:2306.13210v1 [cs.LG])
    In recent years, diffusion models have achieved remarkable success in various domains of artificial intelligence, such as image synthesis, super-resolution, and 3D molecule generation. However, the application of diffusion models in graph learning has received relatively little attention. In this paper, we address this gap by investigating the use of diffusion models for unsupervised graph representation learning. We begin by identifying the anisotropic structures of graphs and a crucial limitation of the vanilla forward diffusion process in learning anisotropic structures. This process relies on continuously adding an isotropic Gaussian noise to the data, which may convert the anisotropic signals to noise too quickly. This rapid conversion hampers the training of denoising neural networks and impedes the acquisition of semantically meaningful representations in the reverse process. To address this challenge, we propose a new class of models called {\it directional diffusion models}. These models incorporate data-dependent, anisotropic, and directional noises in the forward diffusion process. To assess the efficacy of our proposed models, we conduct extensive experiments on 12 publicly available datasets, focusing on two distinct graph representation learning tasks. The experimental results demonstrate the superiority of our models over state-of-the-art baselines, indicating their effectiveness in capturing meaningful graph representations. Our studies not only provide valuable insights into the forward process of diffusion models but also highlight the wide-ranging potential of these models for various graph-related tasks.  ( 2 min )
    Evaluating the Robustness of Text-to-image Diffusion Models against Real-world Attacks. (arXiv:2306.13103v1 [cs.CR])
    Text-to-image (T2I) diffusion models (DMs) have shown promise in generating high-quality images from textual descriptions. The real-world applications of these models require particular attention to their safety and fidelity, but this has not been sufficiently explored. One fundamental question is whether existing T2I DMs are robust against variations over input texts. To answer it, this work provides the first robustness evaluation of T2I DMs against real-world attacks. Unlike prior studies that focus on malicious attacks involving apocryphal alterations to the input texts, we consider an attack space spanned by realistic errors (e.g., typo, glyph, phonetic) that humans can make, to ensure semantic consistency. Given the inherent randomness of the generation process, we develop novel distribution-based attack objectives to mislead T2I DMs. We perform attacks in a black-box manner without any knowledge of the model. Extensive experiments demonstrate the effectiveness of our method for attacking popular T2I DMs and simultaneously reveal their non-trivial robustness issues. Moreover, we provide an in-depth analysis of our method to show that it is not designed to attack the text encoder in T2I DMs solely.  ( 2 min )
    Explainable Lifelong Stream Learning Based on "Glocal" Pairwise Fusion. (arXiv:2306.13410v1 [cs.LG])
    Real-time on-device continual learning applications are used on mobile phones, consumer robots, and smart appliances. Such devices have limited processing and memory storage capabilities, whereas continual learning acquires data over a long period of time. By necessity, lifelong learning algorithms have to be able to operate under such constraints while delivering good performance. This study presents the Explainable Lifelong Learning (ExLL) model, which incorporates several important traits: 1) learning to learn, in a single pass, from streaming data with scarce examples and resources; 2) a self-organizing prototype-based architecture that expands as needed and clusters streaming data into separable groups by similarity and preserves data against catastrophic forgetting; 3) an interpretable architecture to convert the clusters into explainable IF-THEN rules as well as to justify model predictions in terms of what is similar and dissimilar to the inference; and 4) inferences at the global and local level using a pairwise decision fusion process to enhance the accuracy of the inference, hence ``Glocal Pairwise Fusion.'' We compare ExLL against contemporary online learning algorithms for image recognition, using OpenLoris, F-SIOL-310, and Places datasets to evaluate several continual learning scenarios for video streams, low-sample learning, ability to scale, and imbalanced data streams. The algorithms are evaluated for their performance in accuracy, number of parameters, and experiment runtime requirements. ExLL outperforms all algorithms for accuracy in the majority of the tested scenarios.  ( 2 min )
    Online Resource Allocation under Horizon Uncertainty. (arXiv:2206.13606v3 [cs.DS] UPDATED)
    We study stochastic online resource allocation: a decision maker needs to allocate limited resources to stochastically-generated sequentially-arriving requests in order to maximize reward. At each time step, requests are drawn independently from a distribution that is unknown to the decision maker. Online resource allocation and its special cases have been studied extensively in the past, but prior results crucially and universally rely on the strong assumption that the total number of requests (the horizon) is known to the decision maker in advance. In many applications, such as revenue management and online advertising, the number of requests can vary widely because of fluctuations in demand or user traffic intensity. In this work, we develop online algorithms that are robust to horizon uncertainty. In sharp contrast to the known-horizon setting, no algorithm can achieve even a constant asymptotic competitive ratio that is independent of the horizon uncertainty. We introduce a novel generalization of dual mirror descent which allows the decision maker to specify a schedule of time-varying target consumption rates, and prove corresponding performance guarantees. We go on to give a fast algorithm for computing a schedule of target consumption rates that leads to near-optimal performance in the unknown-horizon setting. In particular, our competitive ratio attains the optimal rate of growth (up to logarithmic factors) as the horizon uncertainty grows large. Finally, we also provide a way to incorporate machine-learned predictions about the horizon which interpolates between the known and unknown horizon settings.  ( 3 min )
    Large-step neural network for learning the symplectic evolution from partitioned data. (arXiv:2208.14148v2 [astro-ph.EP] UPDATED)
    In this study, we focus on learning Hamiltonian systems, which involves predicting the coordinate (q) and momentum (p) variables generated by a symplectic mapping. Based on Chen & Tao (2021), the symplectic mapping is represented by a generating function. To extend the prediction time period, we develop a new learning scheme by splitting the time series (q_i, p_i) into several partitions. We then train a large-step neural network (LSNN) to approximate the generating function between the first partition (i.e. the initial condition) and each one of the remaining partitions. This partition approach makes our LSNN effectively suppress the accumulative error when predicting the system evolution. Then we train the LSNN to learn the motions of the 2:3 resonant Kuiper belt objects for a long time period of 25000 yr. The results show that there are two significant improvements over the neural network constructed in our previous work (Li et al. 2022): (1) the conservation of the Jacobi integral, and (2) the highly accurate predictions of the orbital evolution. Overall, we propose that the designed LSNN has the potential to considerably improve predictions of the long-term evolution of more general Hamiltonian systems.  ( 3 min )
    Physics-constrained Random Forests for Turbulence Model Uncertainty Estimation. (arXiv:2306.13370v1 [cs.LG])
    To achieve virtual certification for industrial design, quantifying the uncertainties in simulation-driven processes is crucial. We discuss a physics-constrained approach to account for epistemic uncertainty of turbulence models. In order to eliminate user input, we incorporate a data-driven machine learning strategy. In addition to it, our study focuses on developing an a priori estimation of prediction confidence when accurate data is scarce.  ( 2 min )
    TACOformer:Token-channel compounded Cross Attention for Multimodal Emotion Recognition. (arXiv:2306.13592v1 [cs.MM])
    Recently, emotion recognition based on physiological signals has emerged as a field with intensive research. The utilization of multi-modal, multi-channel physiological signals has significantly improved the performance of emotion recognition systems, due to their complementarity. However, effectively integrating emotion-related semantic information from different modalities and capturing inter-modal dependencies remains a challenging issue. Many existing multimodal fusion methods ignore either token-to-token or channel-to-channel correlations of multichannel signals from different modalities, which limits the classification capability of the models to some extent. In this paper, we propose a comprehensive perspective of multimodal fusion that integrates channel-level and token-level cross-modal interactions. Specifically, we introduce a unified cross attention module called Token-chAnnel COmpound (TACO) Cross Attention to perform multimodal fusion, which simultaneously models channel-level and token-level dependencies between modalities. Additionally, we propose a 2D position encoding method to preserve information about the spatial distribution of EEG signal channels, then we use two transformer encoders ahead of the fusion module to capture long-term temporal dependencies from the EEG signal and the peripheral physiological signal, respectively. Subject-independent experiments on emotional dataset DEAP and Dreamer demonstrate that the proposed model achieves state-of-the-art performance.  ( 2 min )
    Logarithmic Regret for Matrix Games against an Adversary with Noisy Bandit Feedback. (arXiv:2306.13233v1 [cs.LG])
    This paper considers a variant of zero-sum matrix games where at each timestep the row player chooses row $i$, the column player chooses column $j$, and the row player receives a noisy reward with mean $A_{i,j}$. The objective of the row player is to accumulate as much reward as possible, even against an adversarial column player. If the row player uses the EXP3 strategy, an algorithm known for obtaining $\sqrt{T}$ regret against an arbitrary sequence of rewards, it is immediate that the row player also achieves $\sqrt{T}$ regret relative to the Nash equilibrium in this game setting. However, partly motivated by the fact that the EXP3 strategy is myopic to the structure of the game, O'Donoghue et al. (2021) proposed a UCB-style algorithm that leverages the game structure and demonstrated that this algorithm greatly outperforms EXP3 empirically. While they showed that this UCB-style algorithm achieved $\sqrt{T}$ regret, in this paper we ask if there exists an algorithm that provably achieves $\text{polylog}(T)$ regret against any adversary, analogous to results from stochastic bandits. We propose a novel algorithm that answers this question in the affirmative for the simple $2 \times 2$ setting, providing the first instance-dependent guarantees for games in the regret setting. Our algorithm overcomes two major hurdles: 1) obtaining logarithmic regret even though the Nash equilibrium is estimable only at a $1/\sqrt{T}$ rate, and 2) designing row-player strategies that guarantee that either the adversary provides information about the Nash equilibrium, or the row player incurs negative regret. Moreover, in the full information case we address the general $n \times m$ case where the first hurdle is still relevant. Finally, we show that EXP3 and the UCB-based algorithm necessarily cannot perform better than $\sqrt{T}$.  ( 3 min )
    FedSelect: Customized Selection of Parameters for Fine-Tuning during Personalized Federated Learning. (arXiv:2306.13264v1 [cs.LG])
    Recent advancements in federated learning (FL) seek to increase client-level performance by fine-tuning client parameters on local data or personalizing architectures for the local task. Existing methods for such personalization either prune a global model or fine-tune a global model on a local client distribution. However, these existing methods either personalize at the expense of retaining important global knowledge, or predetermine network layers for fine-tuning, resulting in suboptimal storage of global knowledge within client models. Enlightened by the lottery ticket hypothesis, we first introduce a hypothesis for finding optimal client subnetworks to locally fine-tune while leaving the rest of the parameters frozen. We then propose a novel FL framework, FedSelect, using this procedure that directly personalizes both client subnetwork structure and parameters, via the simultaneous discovery of optimal parameters for personalization and the rest of parameters for global aggregation during training. We show that this method achieves promising results on CIFAR-10.  ( 2 min )
    Pruning for Better Domain Generalizability. (arXiv:2306.13237v1 [cs.LG])
    In this paper, we investigate whether we could use pruning as a reliable method to boost the generalization ability of the model. We found that existing pruning method like L2 can already offer small improvement on the target domain performance. We further propose a novel pruning scoring method, called DSS, designed not to maintain source accuracy as typical pruning work, but to directly enhance the robustness of the model. We conduct empirical experiments to validate our method and demonstrate that it can be even combined with state-of-the-art generalization work like MIRO(Cha et al., 2022) to further boost the performance. On MNIST to MNIST-M, we could improve the baseline performance by over 5 points by introducing 60% channel sparsity into the model. On DomainBed benchmark and state-of-the-art MIRO, we can further boost its performance by 1 point only by introducing 10% sparsity into the model. Code can be found at: https://github.com/AlexSunNik/Pruning-for-Better-Domain-Generalizability  ( 2 min )
    Decoding Urban-health Nexus: Interpretable Machine Learning Illuminates Cancer Prevalence based on Intertwined City Features. (arXiv:2306.11847v2 [cs.LG] UPDATED)
    This study investigates the interplay among social demographics, built environment characteristics, and environmental hazard exposure features in determining community level cancer prevalence. Utilizing data from five Metropolitan Statistical Areas in the United States: Chicago, Dallas, Houston, Los Angeles, and New York, the study implemented an XGBoost machine learning model to predict the extent of cancer prevalence and evaluate the importance of different features. Our model demonstrates reliable performance, with results indicating that age, minority status, and population density are among the most influential factors in cancer prevalence. We further explore urban development and design strategies that could mitigate cancer prevalence, focusing on green space, developed areas, and total emissions. Through a series of experimental evaluations based on causal inference, the results show that increasing green space and reducing developed areas and total emissions could alleviate cancer prevalence. The study and findings contribute to a better understanding of the interplay among urban features and community health and also show the value of interpretable machine learning models for integrated urban design to promote public health. The findings also provide actionable insights for urban planning and design, emphasizing the need for a multifaceted approach to addressing urban health disparities through integrated urban design strategies.  ( 2 min )
    MarioGPT: Open-Ended Text2Level Generation through Large Language Models. (arXiv:2302.05981v2 [cs.AI] CROSS LISTED)
    Procedural Content Generation (PCG) algorithms provide a technique to generate complex and diverse environments in an automated way. However, while generating content with PCG methods is often straightforward, generating meaningful content that reflects specific intentions and constraints remains challenging. Furthermore, many PCG algorithms lack the ability to generate content in an open-ended manner. Recently, Large Language Models (LLMs) have shown to be incredibly effective in many diverse domains. These trained LLMs can be fine-tuned, re-using information and accelerating training for new tasks. In this work, we introduce MarioGPT, a fine-tuned GPT2 model trained to generate tile-based game levels, in our case Super Mario Bros levels. We show that MarioGPT can not only generate diverse levels, but can be text-prompted for controllable level generation, addressing one of the key challenges of current PCG techniques. As far as we know, MarioGPT is the first text-to-level model. We also combine MarioGPT with novelty search, enabling it to generate diverse levels with varying play-style dynamics (i.e. player paths). This combination allows for the open-ended generation of an increasingly diverse range of content.  ( 2 min )
    EEG Decoding for Datasets with Heterogenous Electrode Configurations using Transfer Learning Graph Neural Networks. (arXiv:2306.13109v1 [eess.SP])
    Brain-Machine Interfacing (BMI) has greatly benefited from adopting machine learning methods for feature learning that require extensive data for training, which are often unavailable from a single dataset. Yet, it is difficult to combine data across labs or even data within the same lab collected over the years due to the variation in recording equipment and electrode layouts resulting in shifts in data distribution, changes in data dimensionality, and altered identity of data dimensions. Our objective is to overcome this limitation and learn from many different and diverse datasets across labs with different experimental protocols. To tackle the domain adaptation problem, we developed a novel machine learning framework combining graph neural networks (GNNs) and transfer learning methodologies for non-invasive Motor Imagery (MI) EEG decoding, as an example of BMI. Empirically, we focus on the challenges of learning from EEG data with different electrode layouts and varying numbers of electrodes. We utilise three MI EEG databases collected using very different numbers of EEG sensors (from 22 channels to 64) and layouts (from custom layouts to 10-20). Our model achieved the highest accuracy with lower standard deviations on the testing datasets. This indicates that the GNN-based transfer learning framework can effectively aggregate knowledge from multiple datasets with different electrode layouts, leading to improved generalization in subject-independent MI EEG classification. The findings of this study have important implications for Brain-Computer-Interface (BCI) research, as they highlight a promising method for overcoming the limitations posed by non-unified experimental setups. By enabling the integration of diverse datasets with varying electrode layouts, our proposed approach can help advance the development and application of BMI technologies.  ( 3 min )
    Key Frame Extraction with Attention Based Deep Neural Networks. (arXiv:2306.13176v1 [cs.CV])
    Automatic keyframe detection from videos is an exercise in selecting scenes that can best summarize the content for long videos. Providing a summary of the video is an important task to facilitate quick browsing and content summarization. The resulting photos are used for automated works (e.g. summarizing security footage, detecting different scenes used in music clips) in different industries. In addition, processing high-volume videos in advanced machine learning methods also creates resource costs. Keyframes obtained; It can be used as an input feature to the methods and models to be used. In this study; We propose a deep learning-based approach for keyframe detection using a deep auto-encoder model with an attention layer. The proposed method first extracts the features from the video frames using the encoder part of the autoencoder and applies segmentation using the k-means clustering algorithm to group these features and similar frames together. Then, keyframes are selected from each cluster by selecting the frames closest to the center of the clusters. The method was evaluated on the TVSUM video dataset and achieved a classification accuracy of 0.77, indicating a higher success rate than many existing methods. The proposed method offers a promising solution for key frame extraction in video analysis and can be applied to various applications such as video summarization and video retrieval.  ( 2 min )
    Reinforcement Learning in Non-Markovian Environments. (arXiv:2211.01595v2 [eess.SY] UPDATED)
    Motivated by the novel paradigm developed by Van Roy and coauthors for reinforcement learning in arbitrary non-Markovian environments, we propose a related formulation and explicitly pin down the error caused by non-Markovianity of observations when the Q-learning algorithm is applied on this formulation. Based on this observation, we propose that the criterion for agent design should be to seek good approximations for certain conditional laws. Inspired by classical stochastic control, we show that our problem reduces to that of recursive computation of approximate sufficient statistics. This leads to an autoencoder-based scheme for agent design which is then numerically tested on partially observed reinforcement learning environments.  ( 2 min )
    Differentially Private Synthetic Data Using KD-Trees. (arXiv:2306.13211v1 [cs.CR])
    Creation of a synthetic dataset that faithfully represents the data distribution and simultaneously preserves privacy is a major research challenge. Many space partitioning based approaches have emerged in recent years for answering statistical queries in a differentially private manner. However, for synthetic data generation problem, recent research has been mainly focused on deep generative models. In contrast, we exploit space partitioning techniques together with noise perturbation and thus achieve intuitive and transparent algorithms. We propose both data independent and data dependent algorithms for $\epsilon$-differentially private synthetic data generation whose kernel density resembles that of the real dataset. Additionally, we provide theoretical results on the utility-privacy trade-offs and show how our data dependent approach overcomes the curse of dimensionality and leads to a scalable algorithm. We show empirical utility improvements over the prior work, and discuss performance of our algorithm on a downstream classification task on a real dataset.  ( 2 min )
    DISCO-10M: A Large-Scale Music Dataset. (arXiv:2306.13512v1 [cs.SD])
    Music datasets play a crucial role in advancing research in machine learning for music. However, existing music datasets suffer from limited size, accessibility, and lack of audio resources. To address these shortcomings, we present DISCO-10M, a novel and extensive music dataset that surpasses the largest previously available music dataset by an order of magnitude. To ensure high-quality data, we implement a multi-stage filtering process. This process incorporates similarities based on textual descriptions and audio embeddings. Moreover, we provide precomputed CLAP embeddings alongside DISCO-10M, facilitating direct application on various downstream tasks. These embeddings enable efficient exploration of machine learning applications on the provided data. With DISCO-10M, we aim to democratize and facilitate new research to help advance the development of novel machine learning models for music.  ( 2 min )
    Bayesian Networks for the robust and unbiased prediction of depression and its symptoms utilizing speech and multimodal data. (arXiv:2211.04924v2 [cs.LG] UPDATED)
    Predicting the presence of major depressive disorder (MDD) using behavioural and cognitive signals is a highly non-trivial task. The heterogeneous clinical profile of MDD means that any given speech, facial expression and/or observed cognitive pattern may be associated with a unique combination of depressive symptoms. Conventional discriminative machine learning models potentially lack the complexity to robustly model this heterogeneity. Bayesian networks, however, may instead be well-suited to such a scenario. These networks are probabilistic graphical models that efficiently describe the joint probability distribution over a set of random variables by explicitly capturing their conditional dependencies. This framework provides further advantages over standard discriminative modelling by offering the possibility to incorporate expert opinion in the graphical structure of the models, generating explainable model predictions, informing about the uncertainty of predictions, and naturally handling missing data. In this study, we apply a Bayesian framework to capture the relationships between depression, depression symptoms, and features derived from speech, facial expression and cognitive game data collected at thymia.  ( 2 min )
    Multi-task Learning for Radar Signal Characterisation. (arXiv:2306.13105v1 [eess.SP])
    Radio signal recognition is a crucial task in both civilian and military applications, as accurate and timely identification of unknown signals is an essential part of spectrum management and electronic warfare. The majority of research in this field has focused on applying deep learning for modulation classification, leaving the task of signal characterisation as an understudied area. This paper addresses this gap by presenting an approach for tackling radar signal classification and characterisation as a multi-task learning (MTL) problem. We propose the IQ Signal Transformer (IQST) among several reference architectures that allow for simultaneous optimisation of multiple regression and classification tasks. We demonstrate the performance of our proposed MTL model on a synthetic radar dataset, while also providing a first-of-its-kind benchmark for radar signal characterisation.  ( 2 min )
    Two derivations of Principal Component Analysis on datasets of distributions. (arXiv:2306.13503v1 [stat.ML])
    In this brief note, we formulate Principal Component Analysis (PCA) over datasets consisting not of points but of distributions, characterized by their location and covariance. Just like the usual PCA on points can be equivalently derived via a variance-maximization principle and via a minimization of reconstruction error, we derive a closed-form solution for distributional PCA from both of these perspectives.  ( 2 min )
    Precise Asymptotic Generalization for Multiclass Classification with Overparameterized Linear Models. (arXiv:2306.13255v1 [cs.LG])
    We study the asymptotic generalization of an overparameterized linear model for multiclass classification under the Gaussian covariates bi-level model introduced in Subramanian et al.~'22, where the number of data points, features, and classes all grow together. We fully resolve the conjecture posed in Subramanian et al.~'22, matching the predicted regimes for generalization. Furthermore, our new lower bounds are akin to an information-theoretic strong converse: they establish that the misclassification rate goes to 0 or 1 asymptotically. One surprising consequence of our tight results is that the min-norm interpolating classifier can be asymptotically suboptimal relative to noninterpolating classifiers in the regime where the min-norm interpolating regressor is known to be optimal. The key to our tight analysis is a new variant of the Hanson-Wright inequality which is broadly useful for multiclass problems with sparse labels. As an application, we show that the same type of analysis can be used to analyze the related multilabel classification problem under the same bi-level ensemble.  ( 2 min )
    Uniform Convergence with Square-Root Lipschitz Loss. (arXiv:2306.13188v1 [stat.ML])
    We establish generic uniform convergence guarantees for Gaussian data in terms of the Rademacher complexity of the hypothesis class and the Lipschitz constant of the square root of the scalar loss function. We show how these guarantees substantially generalize previous results based on smoothness (Lipschitz constant of the derivative), and allow us to handle the broader class of square-root-Lipschitz losses, which includes also non-smooth loss functions appropriate for studying phase retrieval and ReLU regression, as well as rederive and better understand "optimistic rate" and interpolation learning guarantees.  ( 2 min )
    MBrain: A Multi-channel Self-Supervised Learning Framework for Brain Signals. (arXiv:2306.13102v1 [eess.SP])
    Brain signals are important quantitative data for understanding physiological activities and diseases of human brain. Most existing studies pay attention to supervised learning methods, which, however, require high-cost clinical labels. In addition, the huge difference in the clinical patterns of brain signals measured by invasive (e.g., SEEG) and non-invasive (e.g., EEG) methods leads to the lack of a unified method. To handle the above issues, we propose to study the self-supervised learning (SSL) framework for brain signals that can be applied to pre-train either SEEG or EEG data. Intuitively, brain signals, generated by the firing of neurons, are transmitted among different connecting structures in human brain. Inspired by this, we propose MBrain to learn implicit spatial and temporal correlations between different channels (i.e., contacts of the electrode, corresponding to different brain areas) as the cornerstone for uniformly modeling different types of brain signals. Specifically, we represent the spatial correlation by a graph structure, which is built with proposed multi-channel CPC. We theoretically prove that optimizing the goal of multi-channel CPC can lead to a better predictive representation and apply the instantaneou-time-shift prediction task based on it. Then we capture the temporal correlation by designing the delayed-time-shift prediction task. Finally, replace-discriminative-learning task is proposed to preserve the characteristics of each channel. Extensive experiments of seizure detection on both EEG and SEEG large-scale real-world datasets demonstrate that our model outperforms several state-of-the-art time series SSL and unsupervised models, and has the ability to be deployed to clinical practice.  ( 3 min )
    Human-in-the-Loop Optimization for Deep Stimulus Encoding in Visual Prostheses. (arXiv:2306.13104v1 [q-bio.NC])
    Neuroprostheses show potential in restoring lost sensory function and enhancing human capabilities, but the sensations produced by current devices often seem unnatural or distorted. Exact placement of implants and differences in individual perception lead to significant variations in stimulus response, making personalized stimulus optimization a key challenge. Bayesian optimization could be used to optimize patient-specific stimulation parameters with limited noisy observations, but is not feasible for high-dimensional stimuli. Alternatively, deep learning models can optimize stimulus encoding strategies, but typically assume perfect knowledge of patient-specific variations. Here we propose a novel, practically feasible approach that overcomes both of these fundamental limitations. First, a deep encoder network is trained to produce optimal stimuli for any individual patient by inverting a forward model mapping electrical stimuli to visual percepts. Second, a preferential Bayesian optimization strategy utilizes this encoder to optimize patient-specific parameters for a new patient, using a minimal number of pairwise comparisons between candidate stimuli. We demonstrate the viability of this approach on a novel, state-of-the-art visual prosthesis model. We show that our approach quickly learns a personalized stimulus encoder, leads to dramatic improvements in the quality of restored vision, and is robust to noisy patient feedback and misspecifications in the underlying forward model. Overall, our results suggest that combining the strengths of deep learning and Bayesian optimization could significantly improve the perceptual experience of patients fitted with visual prostheses and may prove a viable solution for a range of neuroprosthetic technologies.  ( 3 min )
    Optimal Sensor Placement with Adaptive Constraints for Nuclear Digital Twins. (arXiv:2306.13637v1 [math.OC])
    Given harsh operating conditions and physical constraints in reactors, nuclear applications cannot afford to equip the physical asset with a large array of sensors. Therefore, it is crucial to carefully determine the placement of sensors within the given spatial limitations, enabling the reconstruction of reactor flow fields and the creation of nuclear digital twins. Various design considerations are imposed, such as predetermined sensor locations, restricted areas within the reactor, a fixed number of sensors allocated to a specific region, or sensors positioned at a designated distance from one another. We develop a data-driven technique that integrates constraints into an optimization procedure for sensor placement, aiming to minimize reconstruction errors. Our approach employs a greedy algorithm that can optimize sensor locations on a grid, adhering to user-defined constraints. We demonstrate the near optimality of our algorithm by computing all possible configurations for selecting a certain number of sensors for a randomly generated state space system. In this work, the algorithm is demonstrated on the Out-of-Pile Testing and Instrumentation Transient Water Irradiation System (OPTI-TWIST) prototype vessel, which is electrically heated to mimic the neutronics effect of the Transient Reactor Test facility (TREAT) at Idaho National Laboratory (INL). The resulting sensor-based reconstruction of temperature within the OPTI-TWIST minimizes error, provides probabilistic bounds for noise-induced uncertainty and will finally be used for communication between the digital twin and experimental facility.  ( 2 min )
    DeepGraviLens: a Multi-Modal Architecture for Classifying Gravitational Lensing Data. (arXiv:2205.00701v4 [astro-ph.IM] UPDATED)
    Gravitational lensing is the relativistic effect generated by massive bodies, which bend the space-time surrounding them. It is a deeply investigated topic in astrophysics and allows validating theoretical relativistic results and studying faint astrophysical objects that would not be visible otherwise. In recent years Machine Learning methods have been applied to support the analysis of the gravitational lensing phenomena by detecting lensing effects in data sets consisting of images associated with brightness variation time series. However, the state-of-art approaches either consider only images and neglect time-series data or achieve relatively low accuracy on the most difficult data sets. This paper introduces DeepGraviLens, a novel multi-modal network that classifies spatio-temporal data belonging to one non-lensed system type and three lensed system types. It surpasses the current state of the art accuracy results by $\approx 3\%$ to $\approx 11\%$, depending on the considered data set. Such an improvement will enable the acceleration of the analysis of lensed objects in upcoming astrophysical surveys, which will exploit the petabytes of data collected, e.g., from the Vera C. Rubin Observatory.  ( 3 min )
    Variational Counterfactual Prediction under Runtime Domain Corruption. (arXiv:2306.13271v1 [cs.LG])
    To date, various neural methods have been proposed for causal effect estimation based on observational data, where a default assumption is the same distribution and availability of variables at both training and inference (i.e., runtime) stages. However, distribution shift (i.e., domain shift) could happen during runtime, and bigger challenges arise from the impaired accessibility of variables. This is commonly caused by increasing privacy and ethical concerns, which can make arbitrary variables unavailable in the entire runtime data and imputation impractical. We term the co-occurrence of domain shift and inaccessible variables runtime domain corruption, which seriously impairs the generalizability of a trained counterfactual predictor. To counter runtime domain corruption, we subsume counterfactual prediction under the notion of domain adaptation. Specifically, we upper-bound the error w.r.t. the target domain (i.e., runtime covariates) by the sum of source domain error and inter-domain distribution distance. In addition, we build an adversarially unified variational causal effect model, named VEGAN, with a novel two-stage adversarial domain adaptation scheme to reduce the latent distribution disparity between treated and control groups first, and between training and runtime variables afterwards. We demonstrate that VEGAN outperforms other state-of-the-art baselines on individual-level treatment effect estimation in the presence of runtime domain corruption on benchmark datasets.  ( 2 min )
    Convergence proof for stochastic gradient descent in the training of deep neural networks with ReLU activation for constant target functions. (arXiv:2112.07369v2 [cs.LG] UPDATED)
    In many numerical simulations stochastic gradient descent (SGD) type optimization methods perform very effectively in the training of deep neural networks (DNNs) but till this day it remains an open problem of research to provide a mathematical convergence analysis which rigorously explains the success of SGD type optimization methods in the training of DNNs. In this work we study SGD type optimization methods in the training of fully-connected feedforward DNNs with rectified linear unit (ReLU) activation. We first establish general regularity properties for the risk functions and their generalized gradient functions appearing in the training of such DNNs and, thereafter, we investigate the plain vanilla SGD optimization method in the training of such DNNs under the assumption that the target function under consideration is a constant function. Specifically, we prove under the assumption that the learning rates (the step sizes of the SGD optimization method) are sufficiently small but not $L^1$-summable and under the assumption that the target function is a constant function that the expectation of the riskof the considered SGD process converges in the training of such DNNs to zero as the number of SGD steps increases to infinity.  ( 3 min )
    BrainNet: Epileptic Wave Detection from SEEG with Hierarchical Graph Diffusion Learning. (arXiv:2306.13101v1 [eess.SP])
    Epilepsy is one of the most serious neurological diseases, affecting 1-2% of the world's population. The diagnosis of epilepsy depends heavily on the recognition of epileptic waves, i.e., disordered electrical brainwave activity in the patient's brain. Existing works have begun to employ machine learning models to detect epileptic waves via cortical electroencephalogram (EEG). However, the recently developed stereoelectrocorticography (SEEG) method provides information in stereo that is more precise than conventional EEG, and has been broadly applied in clinical practice. Therefore, we propose the first data-driven study to detect epileptic waves in a real-world SEEG dataset. While offering new opportunities, SEEG also poses several challenges. In clinical practice, epileptic wave activities are considered to propagate between different regions in the brain. These propagation paths, also known as the epileptogenic network, are deemed to be a key factor in the context of epilepsy surgery. However, the question of how to extract an exact epileptogenic network for each patient remains an open problem in the field of neuroscience. To address these challenges, we propose a novel model (BrainNet) that jointly learns the dynamic diffusion graphs and models the brain wave diffusion patterns. In addition, our model effectively aids in resisting label imbalance and severe noise by employing several self-supervised learning tasks and a hierarchical framework. By experimenting with the extensive real SEEG dataset obtained from multiple patients, we find that BrainNet outperforms several latest state-of-the-art baselines derived from time-series analysis.  ( 3 min )
    The Inductive Bias of Flatness Regularization for Deep Matrix Factorization. (arXiv:2306.13239v1 [cs.LG])
    Recent works on over-parameterized neural networks have shown that the stochasticity in optimizers has the implicit regularization effect of minimizing the sharpness of the loss function (in particular, the trace of its Hessian) over the family zero-loss solutions. More explicit forms of flatness regularization also empirically improve the generalization performance. However, it remains unclear why and when flatness regularization leads to better generalization. This work takes the first step toward understanding the inductive bias of the minimum trace of the Hessian solutions in an important setting: learning deep linear networks from linear measurements, also known as \emph{deep matrix factorization}. We show that for all depth greater than one, with the standard Restricted Isometry Property (RIP) on the measurements, minimizing the trace of Hessian is approximately equivalent to minimizing the Schatten 1-norm of the corresponding end-to-end matrix parameters (i.e., the product of all layer matrices), which in turn leads to better generalization. We empirically verify our theoretical findings on synthetic datasets.  ( 2 min )
    On tracking varying bounds when forecasting bounded time series. (arXiv:2306.13428v1 [stat.ML])
    We consider a new framework where a continuous, though bounded, random variable has unobserved bounds that vary over time. In the context of univariate time series, we look at the bounds as parameters of the distribution of the bounded random variable. We introduce an extended log-likelihood estimation and design algorithms to track the bound through online maximum likelihood estimation. Since the resulting optimization problem is not convex, we make use of recent theoretical results on Normalized Gradient Descent (NGD) for quasiconvex optimization, to eventually derive an Online Normalized Gradient Descent algorithm. We illustrate and discuss the workings of our approach based on both simulation studies and a real-world wind power forecasting problem.  ( 2 min )
    Adaptive Planning Search Algorithm for Analog Circuit Verification. (arXiv:2306.13484v1 [cs.AI])
    Integrated circuit verification has gathered considerable interest in recent times. Since these circuits keep growing in complexity year by year, pre-Silicon (pre-SI) verification becomes ever more important, in order to ensure proper functionality. Thus, in order to reduce the time needed for manually verifying ICs, we propose a machine learning (ML) approach, which uses less simulations. This method relies on an initial evaluation set of operating condition configurations (OCCs), in order to train Gaussian process (GP) surrogate models. By using surrogate models, we can propose further, more difficult OCCs. Repeating this procedure for several iterations has shown better GP estimation of the circuit's responses, on both synthetic and real circuits, resulting in a better chance of finding the worst case, or even failures, for certain circuit responses. Thus, we show that the proposed approach is able to provide OCCs closer to the specifications for all circuits and identify a failure (specification violation) for one of the responses of a real circuit.  ( 2 min )
    Minibatch training of neural network ensembles via trajectory sampling. (arXiv:2306.13442v1 [cond-mat.stat-mech])
    Most iterative neural network training methods use estimates of the loss function over small random subsets (or minibatches) of the data to update the parameters, which aid in decoupling the training time from the (often very large) size of the training datasets. Here, we show that a minibatch approach can also be used to train neural network ensembles (NNEs) via trajectory methods in a highly efficent manner. We illustrate this approach by training NNEs to classify images in the MNIST datasets. This method gives an improvement to the training times, allowing it to scale as the ratio of the size of the dataset to that of the average minibatch size which, in the case of MNIST, gives a computational improvement typically of two orders of magnitude. We highlight the advantage of using longer trajectories to represent NNEs, both for improved accuracy in inference and reduced update cost in terms of the samples needed in minibatch updates.  ( 2 min )
    A New Paradigm for Generative Adversarial Networks based on Randomized Decision Rules. (arXiv:2306.13641v1 [stat.ML])
    The Generative Adversarial Network (GAN) was recently introduced in the literature as a novel machine learning method for training generative models. It has many applications in statistics such as nonparametric clustering and nonparametric conditional independence tests. However, training the GAN is notoriously difficult due to the issue of mode collapse, which refers to the lack of diversity among generated data. In this paper, we identify the reasons why the GAN suffers from this issue, and to address it, we propose a new formulation for the GAN based on randomized decision rules. In the new formulation, the discriminator converges to a fixed point while the generator converges to a distribution at the Nash equilibrium. We propose to train the GAN by an empirical Bayes-like method by treating the discriminator as a hyper-parameter of the posterior distribution of the generator. Specifically, we simulate generators from its posterior distribution conditioned on the discriminator using a stochastic gradient Markov chain Monte Carlo (MCMC) algorithm, and update the discriminator using stochastic gradient descent along with simulations of the generators. We establish convergence of the proposed method to the Nash equilibrium. Apart from image generation, we apply the proposed method to nonparametric clustering and nonparametric conditional independence tests. A portion of the numerical results is presented in the supplementary material.  ( 2 min )
    SpeedLimit: Neural Architecture Search for Quantized Transformer Models. (arXiv:2209.12127v2 [cs.LG] UPDATED)
    While research in the field of transformer models has primarily focused on enhancing performance metrics such as accuracy and perplexity, practical applications in industry often necessitate a rigorous consideration of inference latency constraints. Addressing this challenge, we introduce SpeedLimit, a novel Neural Architecture Search (NAS) technique that optimizes accuracy whilst adhering to an upper-bound latency constraint. Our method incorporates 8-bit integer quantization in the search process to outperform the current state-of-the-art technique. Our results underline the feasibility and efficacy of seeking an optimal balance between performance and latency, providing new avenues for deploying state-of-the-art transformer models in latency-sensitive environments.  ( 2 min )
    PU GNN: Chargeback Fraud Detection in P2E MMORPGs via Graph Attention Networks with Imbalanced PU Labels. (arXiv:2211.08604v7 [cs.LG] UPDATED)
    The recent advent of play-to-earn (P2E) systems in massively multiplayer online role-playing games (MMORPGs) has made in-game goods interchangeable with real-world values more than ever before. The goods in the P2E MMORPGs can be directly exchanged with cryptocurrencies such as Bitcoin, Ethereum, or Klaytn via blockchain networks. Unlike traditional in-game goods, once they had been written to the blockchains, P2E goods cannot be restored by the game operation teams even with chargeback fraud such as payment fraud, cancellation, or refund. To tackle the problem, we propose a novel chargeback fraud prediction method, PU GNN, which leverages graph attention networks with PU loss to capture both the players' in-game behavior with P2E token transaction patterns. With the adoption of modified GraphSMOTE, the proposed model handles the imbalanced distribution of labels in chargeback fraud datasets. The conducted experiments on three real-world P2E MMORPG datasets demonstrate that PU GNN achieves superior performances over previously suggested methods.  ( 3 min )
    Efficient Online Processing with Deep Neural Networks. (arXiv:2306.13474v1 [cs.LG])
    The capabilities and adoption of deep neural networks (DNNs) grow at an exhilarating pace: Vision models accurately classify human actions in videos and identify cancerous tissue in medical scans as precisely than human experts; large language models answer wide-ranging questions, generate code, and write prose, becoming the topic of everyday dinner-table conversations. Even though their uses are exhilarating, the continually increasing model sizes and computational complexities have a dark side. The economic cost and negative environmental externalities of training and serving models is in evident disharmony with financial viability and climate action goals. Instead of pursuing yet another increase in predictive performance, this dissertation is dedicated to the improvement of neural network efficiency. Specifically, a core contribution addresses the efficiency aspects during online inference. Here, the concept of Continual Inference Networks (CINs) is proposed and explored across four publications. CINs extend prior state-of-the-art methods developed for offline processing of spatio-temporal data and reuse their pre-trained weights, improving their online processing efficiency by an order of magnitude. These advances are attained through a bottom-up computational reorganization and judicious architectural modifications. The benefit to online inference is demonstrated by reformulating several widely used network architectures into CINs, including 3D CNNs, ST-GCNs, and Transformer Encoders. An orthogonal contribution tackles the concurrent adaptation and computational acceleration of a large source model into multiple lightweight derived models. Drawing on fusible adapter networks and structured pruning, Structured Pruning Adapters achieve superior predictive accuracy under aggressive pruning using significantly fewer learned weights compared to fine-tuning with pruning.  ( 2 min )
    DiMSam: Diffusion Models as Samplers for Task and Motion Planning under Partial Observability. (arXiv:2306.13196v1 [cs.RO])
    Task and Motion Planning (TAMP) approaches are effective at planning long-horizon autonomous robot manipulation. However, because they require a planning model, it can be difficult to apply them to domains where the environment and its dynamics are not fully known. We propose to overcome these limitations by leveraging deep generative modeling, specifically diffusion models, to learn constraints and samplers that capture these difficult-to-engineer aspects of the planning model. These learned samplers are composed and combined within a TAMP solver in order to find action parameter values jointly that satisfy the constraints along a plan. To tractably make predictions for unseen objects in the environment, we define these samplers on low-dimensional learned latent embeddings of changing object state. We evaluate our approach in an articulated object manipulation domain and show how the combination of classical TAMP, generative learning, and latent embeddings enables long-horizon constraint-based reasoning.  ( 2 min )
    An Agnostic View on the Cost of Overfitting in (Kernel) Ridge Regression. (arXiv:2306.13185v1 [stat.ML])
    We study the cost of overfitting in noisy kernel ridge regression (KRR), which we define as the ratio between the test error of the interpolating ridgeless model and the test error of the optimally-tuned model. We take an "agnostic" view in the following sense: we consider the cost as a function of sample size for any target function, even if the sample size is not large enough for consistency or the target is outside the RKHS. We analyze the cost of overfitting under a Gaussian universality ansatz using recently derived (non-rigorous) risk estimates in terms of the task eigenstructure. Our analysis provides a more refined characterization of benign, tempered and catastrophic overfitting (qv Mallinar et al. 2022).  ( 2 min )
    Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok. (arXiv:2306.13253v1 [cs.LG])
    This paper focuses on predicting the occurrence of grokking in neural networks, a phenomenon in which perfect generalization emerges long after signs of overfitting or memorization are observed. It has been reported that grokking can only be observed with certain hyper-parameters. This makes it critical to identify the parameters that lead to grokking. However, since grokking occurs after a large number of epochs, searching for the hyper-parameters that lead to it is time-consuming. In this paper, we propose a low-cost method to predict grokking without training for a large number of epochs. In essence, by studying the learning curve of the first few epochs, we show that one can predict whether grokking will occur later on. Specifically, if certain oscillations occur in the early epochs, one can expect grokking to occur if the model is trained for a much longer period of time. We propose using the spectral signature of a learning curve derived by applying the Fourier transform to quantify the amplitude of low-frequency components to detect the presence of such oscillations. We also present additional experiments aimed at explaining the cause of these oscillations and characterizing the loss landscape.  ( 3 min )
    Higher-order Motif-based Time Series Classification for Forced Oscillation Source Location in Power Grids. (arXiv:2306.13397v1 [cs.LG])
    Time series motifs are used for discovering higher-order structures of time series data. Based on time series motifs, the motif embedding correlation field (MECF) is proposed to characterize higher-order temporal structures of dynamical system time series. A MECF-based unsupervised learning approach is applied in locating the source of the forced oscillation (FO), a periodic disturbance that detrimentally impacts power grids. Locating the FO source is imperative for system stability. Compared with the Fourier analysis, the MECF-based unsupervised learning is applicable under various FO situations, including the single FO, FO with resonance, and multiple sources FOs. The MECF-based unsupervised learning is a data-driven approach without any prior knowledge requirement of system models or typologies. Tests on the UK high-voltage transmission grid illustrate the effectiveness of MECF-based unsupervised learning. In addition, the impacts of coupling strength and measurement noise on locating the FO source by the MECF-based unsupervised learning are investigated.  ( 2 min )
    Diverse Community Data for Benchmarking Data Privacy Algorithms. (arXiv:2306.13216v1 [cs.CR])
    The Diverse Communities Data Excerpts are the core of a National Institute of Standards and Technology (NIST) program to strengthen understanding of tabular data deidentification technologies such as synthetic data. Synthetic data is an ambitious attempt to democratize the benefits of big data; it uses generative models to recreate sensitive personal data with new records for public release. However, it is vulnerable to the same bias and privacy issues that impact other machine learning applications, and can even amplify those issues. When deidentified data distributions introduce bias or artifacts, or leak sensitive information, they propagate these problems to downstream applications. Furthermore, real-world survey conditions such as diverse subpopulations, heterogeneous non-ordinal data spaces, and complex dependencies between features pose specific challenges for synthetic data algorithms. These observations motivate the need for real, diverse, and complex benchmark data to support a robust understanding of algorithm behavior. This paper introduces four contributions: new theoretical work on the relationship between diverse populations and challenges for equitable deidentification; public benchmark data focused on diverse populations and challenging features curated from the American Community Survey; an open source suite of evaluation metrology for deidentified datasets; and an archive of evaluation results on a broad collection of deidentification techniques. The initial set of evaluation results demonstrate the suitability of these tools for investigations in this field.  ( 2 min )
    DLKoopman: A deep learning software package for Koopman theory. (arXiv:2211.08992v2 [cs.LG] UPDATED)
    We present DLKoopman -- a software package for Koopman theory that uses deep learning to learn an encoding of a nonlinear dynamical system into a linear space, while simultaneously learning the linear dynamics. While several previous efforts have either restricted the ability to learn encodings, or been bespoke efforts designed for specific systems, DLKoopman is a generalized tool that can be applied to data-driven learning and optimization of any dynamical system. It can either be trained on data from individual states (snapshots) of a system and used to predict its unknown states, or trained on data from trajectories of a system and used to predict unknown trajectories for new initial states. DLKoopman is available on the Python Package Index (PyPI) as 'dlkoopman', and includes extensive documentation and tutorials. Additional contributions of the package include a novel metric called Average Normalized Absolute Error for evaluating performance, and a ready-to-use hyperparameter search module for improving performance.  ( 2 min )
    Variance-Covariance Regularization Improves Representation Learning. (arXiv:2306.13292v1 [cs.LG])
    Transfer learning has emerged as a key approach in the machine learning domain, enabling the application of knowledge derived from one domain to improve performance on subsequent tasks. Given the often limited information about these subsequent tasks, a strong transfer learning approach calls for the model to capture a diverse range of features during the initial pretraining stage. However, recent research suggests that, without sufficient regularization, the network tends to concentrate on features that primarily reduce the pretraining loss function. This tendency can result in inadequate feature learning and impaired generalization capability for target tasks. To address this issue, we propose Variance-Covariance Regularization (VCR), a regularization technique aimed at fostering diversity in the learned network features. Drawing inspiration from recent advancements in the self-supervised learning approach, our approach promotes learned representations that exhibit high variance and minimal covariance, thus preventing the network from focusing solely on loss-reducing features. We empirically validate the efficacy of our method through comprehensive experiments coupled with in-depth analytical studies on the learned representations. In addition, we develop an efficient implementation strategy that assures minimal computational overhead associated with our method. Our results indicate that VCR is a powerful and efficient method for enhancing transfer learning performance for both supervised learning and self-supervised learning, opening new possibilities for future research in this domain.  ( 2 min )
    Weak Signal Asymptotics for Sequentially Randomized Experiments. (arXiv:2101.09855v7 [math.ST] UPDATED)
    We use the lens of weak signal asymptotics to study a class of sequentially randomized experiments, including those that arise in solving multi-armed bandit problems. In an experiment with $n$ time steps, we let the mean reward gaps between actions scale to the order $1/\sqrt{n}$ so as to preserve the difficulty of the learning task as $n$ grows. In this regime, we show that the sample paths of a class of sequentially randomized experiments -- adapted to this scaling regime and with arm selection probabilities that vary continuously with state -- converge weakly to a diffusion limit, given as the solution to a stochastic differential equation. The diffusion limit enables us to derive refined, instance-specific characterization of stochastic dynamics, and to obtain several insights on the regret and belief evolution of a number of sequential experiments including Thompson sampling (but not UCB, which does not satisfy our continuity assumption). We show that all sequential experiments whose randomization probabilities have a Lipschitz-continuous dependence on the observed data suffer from sub-optimal regret performance when the reward gaps are relatively large. Conversely, we find that a version of Thompson sampling with an asymptotically uninformative prior variance achieves near-optimal instance-specific regret scaling, including with large reward gaps, but these good regret properties come at the cost of highly unstable posterior beliefs.  ( 3 min )
    AmicroN: A Framework for Generating Annotations for Human Activity Recognition with Granular Micro-Activities. (arXiv:2306.13149v1 [cs.HC])
    Efficient human activity recognition (HAR) using sensor data needs a significant volume of annotated data. The growing volume of unlabelled sensor data has challenged conventional practices for gathering HAR annotations with human-in-the-loop approaches, often leading to the collection of shallower annotations. These shallower annotations ignore the fine-grained micro-activities that constitute any complex activities of daily living (ADL). Understanding this, we, in this paper, first analyze this lack of granular annotations from available pre-annotated datasets to understand the practical inconsistencies and also perform a detailed survey to look into the human perception surrounding annotations. Drawing motivations from these, we next develop the framework AmicroN that can automatically generate micro-activity annotations using locomotive signatures and the available coarse-grain macro-activity labels. In the backend, AmicroN applies change-point detection followed by zero-shot learning with activity embeddings to identify the unseen micro-activities in an unsupervised manner. Rigorous evaluation on publicly available datasets shows that AmicroN can accurately generate micro-activity annotations with a median F1-score of >0.75. Additionally, we also show that AmicroN can be used in a plug-and-play manner with Large Language Models (LLMs) to obtain the micro-activity labels, thus making it more practical for realistic applications.  ( 2 min )
    Visual Adversarial Examples Jailbreak Large Language Models. (arXiv:2306.13213v1 [cs.CR])
    Recently, there has been a surge of interest in introducing vision into Large Language Models (LLMs). The proliferation of large Visual Language Models (VLMs), such as Flamingo, BLIP-2, and GPT-4, signifies an exciting convergence of advancements in both visual and language foundation models. Yet, the risks associated with this integrative approach are largely unexamined. In this paper, we shed light on the security and safety implications of this trend. First, we underscore that the continuous and high-dimensional nature of the additional visual input space intrinsically makes it a fertile ground for adversarial attacks. This unavoidably expands the attack surfaces of LLMs. Second, we highlight that the broad functionality of LLMs also presents visual attackers with a wider array of achievable adversarial objectives, extending the implications of security failures beyond mere misclassification. To elucidate these risks, we study adversarial examples in the visual input space of a VLM. Specifically, against MiniGPT-4, which incorporates safety mechanisms that can refuse harmful instructions, we present visual adversarial examples that can circumvent the safety mechanisms and provoke harmful behaviors of the model. Remarkably, we discover that adversarial examples, even if optimized on a narrow, manually curated derogatory corpus against specific social groups, can universally jailbreak the model's safety mechanisms. A single such adversarial example can generally undermine MiniGPT-4's safety, enabling it to heed a wide range of harmful instructions and produce harmful content far beyond simply imitating the derogatory corpus used in optimization. Unveiling these risks, we accentuate the urgent need for comprehensive risk assessments, robust defense strategies, and the implementation of responsible practices for the secure and safe utilization of VLMs.  ( 3 min )
    Improving Log-Cumulant Based Estimation of Roughness Information in SAR imagery. (arXiv:2306.13200v1 [cs.CV])
    Synthetic Aperture Radar (SAR) image understanding is crucial in remote sensing applications, but it is hindered by its intrinsic noise contamination, called speckle. Sophisticated statistical models, such as the $\mathcal{G}^0$ family of distributions, have been employed to SAR data and many of the current advancements in processing this imagery have been accomplished through extracting information from these models. In this paper, we propose improvements to parameter estimation in $\mathcal{G}^0$ distributions using the Method of Log-Cumulants. First, using Bayesian modeling, we construct that regularly produce reliable roughness estimates under both $\mathcal{G}^0_A$ and $\mathcal{G}^0_I$ models. Second, we make use of an approximation of the Trigamma function to compute the estimated roughness in constant time, making it considerably faster than the existing method for this task. Finally, we show how we can use this method to achieve fast and reliable SAR image understanding based on roughness information.  ( 2 min )
    Synthetic data shuffling accelerates the convergence of federated learning under data heterogeneity. (arXiv:2306.13263v1 [cs.LG])
    In federated learning, data heterogeneity is a critical challenge. A straightforward solution is to shuffle the clients' data to homogenize the distribution. However, this may violate data access rights, and how and when shuffling can accelerate the convergence of a federated optimization algorithm is not theoretically well understood. In this paper, we establish a precise and quantifiable correspondence between data heterogeneity and parameters in the convergence rate when a fraction of data is shuffled across clients. We prove that shuffling can quadratically reduce the gradient dissimilarity with respect to the shuffling percentage, accelerating convergence. Inspired by the theory, we propose a practical approach that addresses the data access rights issue by shuffling locally generated synthetic data. The experimental results show that shuffling synthetic data improves the performance of multiple existing federated learning algorithms by a large margin.  ( 2 min )
    Semi-Autoregressive Energy Flows: Exploring Likelihood-Free Training of Normalizing Flows. (arXiv:2206.06672v2 [cs.LG] UPDATED)
    Training normalizing flow generative models can be challenging due to the need to calculate computationally expensive determinants of Jacobians. This paper studies the likelihood-free training of flows and proposes the energy objective, an alternative sample-based loss based on proper scoring rules. The energy objective is determinant-free and supports flexible model architectures that are not easily compatible with maximum likelihood training, including semi-autoregressive energy flows, a novel model family that interpolates between fully autoregressive and non-autoregressive models. Energy flows feature competitive sample quality, posterior inference, and generation speed relative to likelihood-based flows; this performance is decorrelated from the quality of log-likelihood estimates, which are generally very poor. Our findings question the use of maximum likelihood as an objective or a metric, and contribute to a scientific study of its role in generative modeling.  ( 2 min )
    Can Continual Learning Improve Long-Tailed Recognition? Toward a Unified Framework. (arXiv:2306.13275v1 [cs.LG])
    The Long-Tailed Recognition (LTR) problem emerges in the context of learning from highly imbalanced datasets, in which the number of samples among different classes is heavily skewed. LTR methods aim to accurately learn a dataset comprising both a larger Head set and a smaller Tail set. We propose a theorem where under the assumption of strong convexity of the loss function, the weights of a learner trained on the full dataset are within an upper bound of the weights of the same learner trained strictly on the Head. Next, we assert that by treating the learning of the Head and Tail as two separate and sequential steps, Continual Learning (CL) methods can effectively update the weights of the learner to learn the Tail without forgetting the Head. First, we validate our theoretical findings with various experiments on the toy MNIST-LT dataset. We then evaluate the efficacy of several CL strategies on multiple imbalanced variations of two standard LTR benchmarks (CIFAR100-LT and CIFAR10-LT), and show that standard CL methods achieve strong performance gains in comparison to baselines and approach solutions that have been tailor-made for LTR. We also assess the applicability of CL techniques on real-world data by exploring CL on the naturally imbalanced Caltech256 dataset and demonstrate its superiority over state-of-the-art classifiers. Our work not only unifies LTR and CL but also paves the way for leveraging advances in CL methods to tackle the LTR challenge more effectively.  ( 3 min )
  • Open

    Precise Asymptotic Generalization for Multiclass Classification with Overparameterized Linear Models. (arXiv:2306.13255v1 [cs.LG])
    We study the asymptotic generalization of an overparameterized linear model for multiclass classification under the Gaussian covariates bi-level model introduced in Subramanian et al.~'22, where the number of data points, features, and classes all grow together. We fully resolve the conjecture posed in Subramanian et al.~'22, matching the predicted regimes for generalization. Furthermore, our new lower bounds are akin to an information-theoretic strong converse: they establish that the misclassification rate goes to 0 or 1 asymptotically. One surprising consequence of our tight results is that the min-norm interpolating classifier can be asymptotically suboptimal relative to noninterpolating classifiers in the regime where the min-norm interpolating regressor is known to be optimal. The key to our tight analysis is a new variant of the Hanson-Wright inequality which is broadly useful for multiclass problems with sparse labels. As an application, we show that the same type of analysis can be used to analyze the related multilabel classification problem under the same bi-level ensemble.  ( 2 min )
    Uniform Convergence with Square-Root Lipschitz Loss. (arXiv:2306.13188v1 [stat.ML])
    We establish generic uniform convergence guarantees for Gaussian data in terms of the Rademacher complexity of the hypothesis class and the Lipschitz constant of the square root of the scalar loss function. We show how these guarantees substantially generalize previous results based on smoothness (Lipschitz constant of the derivative), and allow us to handle the broader class of square-root-Lipschitz losses, which includes also non-smooth loss functions appropriate for studying phase retrieval and ReLU regression, as well as rederive and better understand "optimistic rate" and interpolation learning guarantees.
    Two derivations of Principal Component Analysis on datasets of distributions. (arXiv:2306.13503v1 [stat.ML])
    In this brief note, we formulate Principal Component Analysis (PCA) over datasets consisting not of points but of distributions, characterized by their location and covariance. Just like the usual PCA on points can be equivalently derived via a variance-maximization principle and via a minimization of reconstruction error, we derive a closed-form solution for distributional PCA from both of these perspectives.
    Best Arm Identification in Stochastic Bandits: Beyond $\beta-$optimality. (arXiv:2301.03785v2 [stat.ML] UPDATED)
    This paper investigates a hitherto unaddressed aspect of best arm identification (BAI) in stochastic multi-armed bandits in the fixed-confidence setting. Two key metrics for assessing bandit algorithms are computational efficiency and performance optimality (e.g., in sample complexity). In stochastic BAI literature, there have been advances in designing algorithms to achieve optimal performance, but they are generally computationally expensive to implement (e.g., optimization-based methods). There also exist approaches with high computational efficiency, but they have provable gaps to the optimal performance (e.g., the $\beta$-optimal approaches in top-two methods). This paper introduces a framework and an algorithm for BAI that achieves optimal performance with a computationally efficient set of decision rules. The central process that facilitates this is a routine for sequentially estimating the optimal allocations up to sufficient fidelity. Specifically, these estimates are accurate enough for identifying the best arm (hence, achieving optimality) but not overly accurate to an unnecessary extent that creates excessive computational complexity (hence, maintaining efficiency). Furthermore, the existing relevant literature focuses on the family of exponential distributions. This paper considers a more general setting of any arbitrary family of distributions parameterized by their mean values (under mild regularity conditions). The optimality is established analytically, and numerical evaluations are provided to assess the analytical guarantees and compare the performance with those of the existing ones.
    Approximate Causal Effect Identification under Weak Confounding. (arXiv:2306.13242v1 [stat.ML])
    Causal effect estimation has been studied by many researchers when only observational data is available. Sound and complete algorithms have been developed for pointwise estimation of identifiable causal queries. For non-identifiable causal queries, researchers developed polynomial programs to estimate tight bounds on causal effect. However, these are computationally difficult to optimize for variables with large support sizes. In this paper, we analyze the effect of "weak confounding" on causal estimands. More specifically, under the assumption that the unobserved confounders that render a query non-identifiable have small entropy, we propose an efficient linear program to derive the upper and lower bounds of the causal effect. We show that our bounds are consistent in the sense that as the entropy of unobserved confounders goes to zero, the gap between the upper and lower bound vanishes. Finally, we conduct synthetic and real data simulations to compare our bounds with the bounds obtained by the existing work that cannot incorporate such entropy constraints and show that our bounds are tighter for the setting with weak confounders.
    Differentially Private Synthetic Data Using KD-Trees. (arXiv:2306.13211v1 [cs.CR])
    Creation of a synthetic dataset that faithfully represents the data distribution and simultaneously preserves privacy is a major research challenge. Many space partitioning based approaches have emerged in recent years for answering statistical queries in a differentially private manner. However, for synthetic data generation problem, recent research has been mainly focused on deep generative models. In contrast, we exploit space partitioning techniques together with noise perturbation and thus achieve intuitive and transparent algorithms. We propose both data independent and data dependent algorithms for $\epsilon$-differentially private synthetic data generation whose kernel density resembles that of the real dataset. Additionally, we provide theoretical results on the utility-privacy trade-offs and show how our data dependent approach overcomes the curse of dimensionality and leads to a scalable algorithm. We show empirical utility improvements over the prior work, and discuss performance of our algorithm on a downstream classification task on a real dataset.
    Compression with Bayesian Implicit Neural Representations. (arXiv:2305.19185v2 [cs.LG] UPDATED)
    Many common types of data can be represented as functions that map coordinates to signal values, such as pixel locations to RGB values in the case of an image. Based on this view, data can be compressed by overfitting a compact neural network to its functional representation and then encoding the network weights. However, most current solutions for this are inefficient, as quantization to low-bit precision substantially degrades the reconstruction quality. To address this issue, we propose overfitting variational Bayesian neural networks to the data and compressing an approximate posterior weight sample using relative entropy coding instead of quantizing and entropy coding it. This strategy enables direct optimization of the rate-distortion performance by minimizing the $\beta$-ELBO, and target different rate-distortion trade-offs for a given network architecture by adjusting $\beta$. Moreover, we introduce an iterative algorithm for learning prior weight distributions and employ a progressive refinement process for the variational posterior that significantly enhances performance. Experiments show that our method achieves strong performance on image and audio compression while retaining simplicity.  ( 2 min )
    Curvature Filtrations for Graph Generative Model Evaluation. (arXiv:2301.12906v2 [cs.LG] UPDATED)
    Graph generative model evaluation necessitates understanding differences between graphs on the distributional level. This entails being able to harness salient attributes of graphs in an efficient manner. Curvature constitutes one such property of graphs, and has recently started to prove useful in characterising graphs. Its expressive properties, stability, and practical utility in model evaluation remain largely unexplored, however. We combine graph curvature descriptors with emerging methods from topological data analysis to obtain robust, expressive descriptors for evaluating graph generative models.  ( 2 min )
    A New Paradigm for Generative Adversarial Networks based on Randomized Decision Rules. (arXiv:2306.13641v1 [stat.ML])
    The Generative Adversarial Network (GAN) was recently introduced in the literature as a novel machine learning method for training generative models. It has many applications in statistics such as nonparametric clustering and nonparametric conditional independence tests. However, training the GAN is notoriously difficult due to the issue of mode collapse, which refers to the lack of diversity among generated data. In this paper, we identify the reasons why the GAN suffers from this issue, and to address it, we propose a new formulation for the GAN based on randomized decision rules. In the new formulation, the discriminator converges to a fixed point while the generator converges to a distribution at the Nash equilibrium. We propose to train the GAN by an empirical Bayes-like method by treating the discriminator as a hyper-parameter of the posterior distribution of the generator. Specifically, we simulate generators from its posterior distribution conditioned on the discriminator using a stochastic gradient Markov chain Monte Carlo (MCMC) algorithm, and update the discriminator using stochastic gradient descent along with simulations of the generators. We establish convergence of the proposed method to the Nash equilibrium. Apart from image generation, we apply the proposed method to nonparametric clustering and nonparametric conditional independence tests. A portion of the numerical results is presented in the supplementary material.  ( 2 min )
    On the Convergence Rate of Gaussianization with Random Rotations. (arXiv:2306.13520v1 [cs.LG])
    Gaussianization is a simple generative model that can be trained without backpropagation. It has shown compelling performance on low dimensional data. As the dimension increases, however, it has been observed that the convergence speed slows down. We show analytically that the number of required layers scales linearly with the dimension for Gaussian input. We argue that this is because the model is unable to capture dependencies between dimensions. Empirically, we find the same linear increase in cost for arbitrary input $p(x)$, but observe favorable scaling for some distributions. We explore potential speed-ups and formulate challenges for further research.  ( 2 min )
    Semi-Autoregressive Energy Flows: Exploring Likelihood-Free Training of Normalizing Flows. (arXiv:2206.06672v2 [cs.LG] UPDATED)
    Training normalizing flow generative models can be challenging due to the need to calculate computationally expensive determinants of Jacobians. This paper studies the likelihood-free training of flows and proposes the energy objective, an alternative sample-based loss based on proper scoring rules. The energy objective is determinant-free and supports flexible model architectures that are not easily compatible with maximum likelihood training, including semi-autoregressive energy flows, a novel model family that interpolates between fully autoregressive and non-autoregressive models. Energy flows feature competitive sample quality, posterior inference, and generation speed relative to likelihood-based flows; this performance is decorrelated from the quality of log-likelihood estimates, which are generally very poor. Our findings question the use of maximum likelihood as an objective or a metric, and contribute to a scientific study of its role in generative modeling.  ( 2 min )
    COCKATIEL: COntinuous Concept ranKed ATtribution with Interpretable ELements for explaining neural net classifiers on NLP tasks. (arXiv:2305.06754v2 [cs.CL] CROSS LISTED)
    Transformer architectures are complex and their use in NLP, while it has engendered many successes, makes their interpretability or explainability challenging. Recent debates have shown that attention maps and attribution methods are unreliable (Pruthi et al., 2019; Brunner et al., 2019). In this paper, we present some of their limitations and introduce COCKATIEL, which successfully addresses some of them. COCKATIEL is a novel, post-hoc, concept-based, model-agnostic XAI technique that generates meaningful explanations from the last layer of a neural net model trained on an NLP classification task by using Non-Negative Matrix Factorization (NMF) to discover the concepts the model leverages to make predictions and by exploiting a Sensitivity Analysis to estimate accurately the importance of each of these concepts for the model. It does so without compromising the accuracy of the underlying model or requiring a new one to be trained. We conduct experiments in single and multi-aspect sentiment analysis tasks and we show COCKATIEL's superior ability to discover concepts that align with humans' on Transformer models without any supervision, we objectively verify the faithfulness of its explanations through fidelity metrics, and we showcase its ability to provide meaningful explanations in two different datasets.  ( 3 min )
    Efficient Model Selection for Predictive Pattern Mining Model by Safe Pattern Pruning. (arXiv:2306.13561v1 [stat.ML])
    Predictive pattern mining is an approach used to construct prediction models when the input is represented by structured data, such as sets, graphs, and sequences. The main idea behind predictive pattern mining is to build a prediction model by considering substructures, such as subsets, subgraphs, and subsequences (referred to as patterns), present in the structured data as features of the model. The primary challenge in predictive pattern mining lies in the exponential growth of the number of patterns with the complexity of the structured data. In this study, we propose the Safe Pattern Pruning (SPP) method to address the explosion of pattern numbers in predictive pattern mining. We also discuss how it can be effectively employed throughout the entire model building process in practical data analysis. To demonstrate the effectiveness of the proposed method, we conduct numerical experiments on regression and classification problems involving sets, graphs, and sequences.  ( 2 min )
    Prediction under Latent Subgroup Shifts with High-Dimensional Observations. (arXiv:2306.13472v1 [stat.ML])
    We introduce a new approach to prediction in graphical models with latent-shift adaptation, i.e., where source and target environments differ in the distribution of an unobserved confounding latent variable. Previous work has shown that as long as "concept" and "proxy" variables with appropriate dependence are observed in the source environment, the latent-associated distributional changes can be identified, and target predictions adapted accurately. However, practical estimation methods do not scale well when the observations are complex and high-dimensional, even if the confounding latent is categorical. Here we build upon a recently proposed probabilistic unsupervised learning framework, the recognition-parametrised model (RPM), to recover low-dimensional, discrete latents from image observations. Applied to the problem of latent shifts, our novel form of RPM identifies causal latent structure in the source environment, and adapts properly to predict in the target. We demonstrate results in settings where predictor and proxy are high-dimensional images, a context to which previous methods fail to scale.  ( 2 min )
    SPRT-based Efficient Best Arm Identification in Stochastic Bandits. (arXiv:2207.11158v3 [stat.ML] UPDATED)
    This paper investigates the best arm identification (BAI) problem in stochastic multi-armed bandits in the fixed confidence setting. The general class of the exponential family of bandits is considered. The existing algorithms for the exponential family of bandits face computational challenges. To mitigate these challenges, the BAI problem is viewed and analyzed as a sequential composite hypothesis testing task, and a framework is proposed that adopts the likelihood ratio-based tests known to be effective for sequential testing. Based on this test statistic, a BAI algorithm is designed that leverages the canonical sequential probability ratio tests for arm selection and is amenable to tractable analysis for the exponential family of bandits. This algorithm has two key features: (1) its sample complexity is asymptotically optimal, and (2) it is guaranteed to be $\delta-$PAC. Existing efficient approaches focus on the Gaussian setting and require Thompson sampling for the arm deemed the best and the challenger arm. Additionally, this paper analytically quantifies the computational expense of identifying the challenger in an existing approach. Finally, numerical experiments are provided to support the analysis.  ( 2 min )
    Convergence of First-Order Methods for Constrained Nonconvex Optimization with Dependent Data. (arXiv:2203.15797v2 [math.OC] UPDATED)
    We focus on analyzing the classical stochastic projected gradient methods under a general dependent data sampling scheme for constrained smooth nonconvex optimization. We show the worst-case rate of convergence $\tilde{O}(t^{-1/4})$ and complexity $\tilde{O}(\varepsilon^{-4})$ for achieving an $\varepsilon$-near stationary point in terms of the norm of the gradient of Moreau envelope and gradient mapping. While classical convergence guarantee requires i.i.d. data sampling from the target distribution, we only require a mild mixing condition of the conditional distribution, which holds for a wide class of Markov chain sampling algorithms. This improves the existing complexity for the constrained smooth nonconvex optimization with dependent data from $\tilde{O}(\varepsilon^{-8})$ to $\tilde{O}(\varepsilon^{-4})$ with a significantly simpler analysis. We illustrate the generality of our approach by deriving convergence results with dependent data for stochastic proximal gradient methods, adaptive stochastic gradient algorithm AdaGrad and stochastic gradient algorithm with heavy ball momentum. As an application, we obtain first online nonnegative matrix factorization algorithms for dependent data based on stochastic projected gradient methods with adaptive step sizes and optimal rate of convergence.  ( 2 min )
    Understanding quantum machine learning also requires rethinking generalization. (arXiv:2306.13461v1 [quant-ph])
    Quantum machine learning models have shown successful generalization performance even when trained with few data. In this work, through systematic randomization experiments, we show that traditional approaches to understanding generalization fail to explain the behavior of such quantum models. Our experiments reveal that state-of-the-art quantum neural networks accurately fit random states and random labeling of training data. This ability to memorize random data defies current notions of small generalization error, problematizing approaches that build on complexity measures such as the VC dimension, the Rademacher complexity, and all their uniform relatives. We complement our empirical results with a theoretical construction showing that quantum neural networks can fit arbitrary labels to quantum states, hinting at their memorization ability. Our results do not preclude the possibility of good generalization with few training data but rather rule out any possible guarantees based only on the properties of the model family. These findings expose a fundamental challenge in the conventional understanding of generalization in quantum machine learning and highlight the need for a paradigm shift in the design of quantum models for machine learning tasks.  ( 2 min )
    Variational Counterfactual Prediction under Runtime Domain Corruption. (arXiv:2306.13271v1 [cs.LG])
    To date, various neural methods have been proposed for causal effect estimation based on observational data, where a default assumption is the same distribution and availability of variables at both training and inference (i.e., runtime) stages. However, distribution shift (i.e., domain shift) could happen during runtime, and bigger challenges arise from the impaired accessibility of variables. This is commonly caused by increasing privacy and ethical concerns, which can make arbitrary variables unavailable in the entire runtime data and imputation impractical. We term the co-occurrence of domain shift and inaccessible variables runtime domain corruption, which seriously impairs the generalizability of a trained counterfactual predictor. To counter runtime domain corruption, we subsume counterfactual prediction under the notion of domain adaptation. Specifically, we upper-bound the error w.r.t. the target domain (i.e., runtime covariates) by the sum of source domain error and inter-domain distribution distance. In addition, we build an adversarially unified variational causal effect model, named VEGAN, with a novel two-stage adversarial domain adaptation scheme to reduce the latent distribution disparity between treated and control groups first, and between training and runtime variables afterwards. We demonstrate that VEGAN outperforms other state-of-the-art baselines on individual-level treatment effect estimation in the presence of runtime domain corruption on benchmark datasets.  ( 2 min )
    On tracking varying bounds when forecasting bounded time series. (arXiv:2306.13428v1 [stat.ML])
    We consider a new framework where a continuous, though bounded, random variable has unobserved bounds that vary over time. In the context of univariate time series, we look at the bounds as parameters of the distribution of the bounded random variable. We introduce an extended log-likelihood estimation and design algorithms to track the bound through online maximum likelihood estimation. Since the resulting optimization problem is not convex, we make use of recent theoretical results on Normalized Gradient Descent (NGD) for quasiconvex optimization, to eventually derive an Online Normalized Gradient Descent algorithm. We illustrate and discuss the workings of our approach based on both simulation studies and a real-world wind power forecasting problem.  ( 2 min )
    STEEL: Singularity-aware Reinforcement Learning. (arXiv:2301.13152v4 [stat.ML] UPDATED)
    Batch reinforcement learning (RL) aims at leveraging pre-collected data to find an optimal policy that maximizes the expected total rewards in a dynamic environment. Nearly all existing algorithms rely on the absolutely continuous assumption on the distribution induced by target policies with respect to the data distribution, so that the batch data can be used to calibrate target policies via the change of measure. However, the absolute continuity assumption could be violated in practice (e.g., no-overlap support), especially when the state-action space is large or continuous. In this paper, we propose a new batch RL algorithm without requiring absolute continuity in the setting of an infinite-horizon Markov decision process with continuous states and actions. We call our algorithm STEEL: SingulariTy-awarE rEinforcement Learning. Our algorithm is motivated by a new error analysis on off-policy evaluation, where we use maximum mean discrepancy, together with distributionally robust optimization, to characterize the error of off-policy evaluation caused by the possible singularity and to enable model extrapolation. By leveraging the idea of pessimism and under some mild conditions, we derive a finite-sample regret guarantee for our proposed algorithm without imposing absolute continuity. Compared with existing algorithms, by requiring only minimal data-coverage assumption, STEEL significantly improves the applicability and robustness of batch RL. Extensive simulation studies and one real experiment on personalized pricing demonstrate the superior performance of our method in dealing with possible singularity in batch RL.  ( 3 min )
    Lower Complexity Adaptation for Empirical Entropic Optimal Transport. (arXiv:2306.13580v1 [math.ST])
    Entropic optimal transport (EOT) presents an effective and computationally viable alternative to unregularized optimal transport (OT), offering diverse applications for large-scale data analysis. In this work, we derive novel statistical bounds for empirical plug-in estimators of the EOT cost and show that their statistical performance in the entropy regularization parameter $\epsilon$ and the sample size $n$ only depends on the simpler of the two probability measures. For instance, under sufficiently smooth costs this yields the parametric rate $n^{-1/2}$ with factor $\epsilon^{-d/2}$, where $d$ is the minimum dimension of the two population measures. This confirms that empirical EOT also adheres to the lower complexity adaptation principle, a hallmark feature only recently identified for unregularized OT. As a consequence of our theory, we show that the empirical entropic Gromov-Wasserstein distance and its unregularized version for measures on Euclidean spaces also obey this principle. Additionally, we comment on computational aspects and complement our findings with Monte Carlo simulations. Our techniques employ empirical process theory and rely on a dual formulation of EOT over a single function class. Crucial to our analysis is the observation that the entropic cost-transformation of a function class does not increase its uniform metric entropy by much.  ( 2 min )
    An Agnostic View on the Cost of Overfitting in (Kernel) Ridge Regression. (arXiv:2306.13185v1 [stat.ML])
    We study the cost of overfitting in noisy kernel ridge regression (KRR), which we define as the ratio between the test error of the interpolating ridgeless model and the test error of the optimally-tuned model. We take an "agnostic" view in the following sense: we consider the cost as a function of sample size for any target function, even if the sample size is not large enough for consistency or the target is outside the RKHS. We analyze the cost of overfitting under a Gaussian universality ansatz using recently derived (non-rigorous) risk estimates in terms of the task eigenstructure. Our analysis provides a more refined characterization of benign, tempered and catastrophic overfitting (qv Mallinar et al. 2022).  ( 2 min )
    Adversarial Resilience in Sequential Prediction via Abstention. (arXiv:2306.13119v1 [cs.LG])
    We study the problem of sequential prediction in the stochastic setting with an adversary that is allowed to inject clean-label adversarial (or out-of-distribution) examples. Algorithms designed to handle purely stochastic data tend to fail in the presence of such adversarial examples, often leading to erroneous predictions. This is undesirable in many high-stakes applications such as medical recommendations, where abstaining from predictions on adversarial examples is preferable to misclassification. On the other hand, assuming fully adversarial data leads to very pessimistic bounds that are often vacuous in practice. To capture this motivation, we propose a new model of sequential prediction that sits between the purely stochastic and fully adversarial settings by allowing the learner to abstain from making a prediction at no cost on adversarial examples. Assuming access to the marginal distribution on the non-adversarial examples, we design a learner whose error scales with the VC dimension (mirroring the stochastic setting) of the hypothesis class, as opposed to the Littlestone dimension which characterizes the fully adversarial setting. Furthermore, we design a learner for VC dimension~1 classes, which works even in the absence of access to the marginal distribution. Our key technical contribution is a novel measure for quantifying uncertainty for learning VC classes, which may be of independent interest.

  • Open

    With great power comes great ... scams, just scams
    submitted by /u/enspiralart [link] [comments]  ( 8 min )
    Keep getting an RVC error
    trying to use RVC on linux, and I keep getting a python trainset_preprocess_pipline_print.py error anytime I try and process anything, regardless of the path, spelling, characters in the path, anything and every things spits out that exact error.... I put up a github issue on the RVC page, but when when I logged off I didn't see it, so github I guess shadowbans VPN users or whatever, as I created the account for that issue. oh well. submitted by /u/sovereign-celestial [link] [comments]  ( 8 min )
    Real-time male-female vocal RVC model. With Italian dataset.
    I was experimenting with RVC and created a model for male to female voice conversion. the result is acceptable even if the voice still sounds a bit robotic. and the conversion has a delay of just under a second. I've noticed that RVC can render accents much better than software like voice.ai. I post a demo of my model and take the opportunity to ask you if you know of any other AI-based voice changers that works well with languages ​​other than English. ​ https://reddit.com/link/14ix7bl/video/ihoxh8p2a88b1/player submitted by /u/qubit__ [link] [comments]  ( 8 min )
    AI is a BEAST! Use it, or get left BEHIND!
    submitted by /u/LinedVaccination [link] [comments]  ( 8 min )
    Does anyone know of any good forums pertaining to AI discussion?
    I've only came across a couple, I even had ChatGPT attempt to find me some obscure ones. submitted by /u/Maelasae [link] [comments]  ( 8 min )
    Everything You Need to Know about AI
    CNN writer talks about the danger of AI, quoting "Top AI executives have warned that AI could potentially bring about human extinction. But these same executives are also racing to deploy the technology into their products." What do you Think? Is AI dangerous? Or can AI become dangerous in the future? I personally think AI can have a huge impact on the workforce; everything is getting smarter and with improved capabilities than ever before. This shift to AI will fuel the next generation of technology, and perhaps the next generation of technology will fuel new dangers... Waiting for you guys' thoughts below! submitted by /u/Hello_I_am_newell [link] [comments]  ( 8 min )
    AI on Trump's lies.
    Prompt: Has Donald J Trump ever lied? Bing AI: "Donald Trump made tens of thousands of false or misleading claims". ChatGPT: "Donald J. Trump has been known to make false or misleading statements throughout his political career". Google Bard: "Whether or not Donald Trump has ever lied is a matter of opinion." submitted by /u/decorama [link] [comments]  ( 8 min )
    AI Gold Explosions Series [workflow in comments]
    submitted by /u/PeePeePeePooPooPooo [link] [comments]  ( 8 min )
    Does anybody know an AI tool where you can upload a video and it will tell you what the video is all about? Or at least the summary of it?
    I have a lot of online courses to watch (approximately 50 hours of content) but due to time constraints, I am forced to take a shortcut. I want to know if there is an available AI tool where you can upload your video (mp4, mkv, etc...) at least 10 minutes of duration, that can summarize it for you? Or give you the most important details as possible. If you know any, appreciate a lot for your help. Thank you! submitted by /u/virtuabart [link] [comments]  ( 8 min )
    I made a new Michael Jackson album - with AI.
    Using RVC, I made some AI covers, original songs and reconstructions with the voice of Michael Jackson. Check it out! submitted by /u/Some_Cryptographer_9 [link] [comments]  ( 8 min )
    The bigger-is-better approach to AI is running out of road
    submitted by /u/bartturner [link] [comments]  ( 8 min )
    One-Minute Daily AI News 6/24/2023
    Sony Music has created a new position — Executive VP Of AI, hiring former BPI CEO Geoff Taylor, according to Billboard, who obtained an internal memo from the company. The new role will involve Sony “coordinating its business efforts surrounding AI” and “coordinating across the global digital business and business and legal affairs division.”[1] Prime Minister Narendra Modi of India held meetings with prominent U.S. and Indian technology executives, such as Tim Cook from Apple, Sundar Pichai from Google, and Satya Nadella from Microsoft. While the details of their conversation were kept private, the statement highlighted the discussion of technology’s power, specifically AI.[2] AI startup Stability AI just debuted its latest version of Stable Diffusion—and the model does not disappoint. Stable Diffusion XL (SDXL) v0.9 delivers ultra-photorealistic imagery, surpassing previous iterations in terms of sophistication and visual quality. This means, among other things, that Stability AI’s new model will not generate those troublesome “spaghetti hands” so often.[3] With no official photos of the Titan wreckage released, unscrupulous Twitter and Facebook accounts have stepped in with fake photos they created using AI image generators.[4] Sources: [1] https://www.stereogum.com/2228177/sony-music-creates-executive-role-dedicated-to-ai/news/ [2] https://www.businesstoday.in/technology/news/story/pm-modi-meets-microsoft-ceo-satya-nadella-discusses-ai-and-technology-cooperation-with-india-386932-2023-06-24 [3] https://decrypt.co/145842/stable-diffusion-upgrade-ai-images-midjourney [4] https://www.pcmag.com/news/ai-generated-images-of-titan-submersible-debris-hit-twitter-facebook submitted by /u/Excellent-Target-847 [link] [comments]  ( 8 min )
    AI Tools to making short clips automatically.
    Hi, First of all, If this sub it's not proper place to ask questions like this, warn me in comments please. ​ I am searching an AI Tools that can make short content for platforms such as Tiktok, Instagram Reels, Youtube Shorts ect. I'm looking for something like AI makes the content out of full lengt videos, analyze the edits, rate the edits and decide which edit is most interesting to watch, and serve me. ​ Basically all have is left is sharing the post in platforms :) Is there a way to do it? submitted by /u/besterk [link] [comments]  ( 8 min )
    Taking Game AI Beyond Pathfinding and Combat: Building NPCs That Remember and Evolve
    Hi PC gamers, ever wondered what happens in a game world when you're not around? In my current project, the answer is - a lot! I'm working on creating a game world where NPCs don't just exist for player interaction but live their own lives. Every NPC remembers your interactions and has conversations with other NPCs based on those experiences. Even when you're logged off, the world evolves. An NPC could rise from being a town crier to mayor based on their actions and the dynamics they create with other NPCs. Imagine booting up your game after a week, returning to a town to find things have changed in your absence. The guard you'd befriended has become the captain, the merchant you helped is now prosperous, and the rumors about that dragon you slayed have spread across the kingdom, making you a hero even in places you've never visited. It's about creating a game world that’s not static, but dynamic and ever-changing. Want to follow along on this journey, discuss, or contribute ideas? Join us on Discord https://discord.com/invite/4mnzkYSJDg - let's shape this new era of gaming together! submitted by /u/Chance_Confection_37 [link] [comments]  ( 8 min )
    Dee Jesus and friends On the wheels of steel 🎛🎚🎧
    submitted by /u/StoicPrinciples [link] [comments]  ( 8 min )
  • Open

    [R] Deep Reinforcement Learning Policies Learn Shared Adversarial Features across MDPs
    https://ojs.aaai.org/index.php/AAAI/article/view/20684/20443 submitted by /u/ml_dnn [link] [comments]  ( 8 min )
    [Discussion] LLM classification with large number of classes.
    Hello and welcome back ! I'm working on a classification with a LLM. That did work with the first version when I had "only" 5k classes, but now I have a bit more than the double, and it is pretty hard to fit (as, of course, the number of samples was also multiplied by ~5). Is there any easy to implement way to do that ? I've seen "Hierarchical Softmax", but 1/ didn't find a pytorch implementation to use with transformers, and also I wondered if there was not a more recent (and better) to deal with this kinf of problem ? Thank you in advance ! submitted by /u/ez613 [link] [comments]  ( 8 min )
    [N] How OpenAI's unique equity compensation works
    In 2019, OpenAI changed from a nonprofit to a ‘capped profit’ model in an effort to raise capital while still serving their mission. Per their blog post, “The fundamental idea of OpenAI LP is that investors and employees can get a capped return if we succeed at our mission, which allows us to raise investment capital and attract employees with startup-like equity. But any returns beyond that amount—and if we are successful, we expect to generate orders of magnitude more value than we’d owe to people who invest in or work at OpenAI LP—are owned by the original OpenAI Nonprofit entity.” How does an OpenAI offer work? There are two components to OpenAI’s offer: the base salary and the ‘equity,’ which they call ‘Profit Participation Units’ or ‘PPUs.’ The base salary is standard and self-explanatory. It’s the cash paycheck you get every two weeks. The profit participation units are where it starts to get confusing, as companies sometimes use profit participation or profit interest grants differently. Here’s a recent OpenAI offer (see all OpenAI offers here), where you can see a base salary of $300k/year with a $2M PPU grant that vests evenly over 4 years, bringing the yearly total compensation to $800k. https://www.levels.fyi/blog/openai-compensation.html submitted by /u/That_Violinist_18 [link] [comments]  ( 8 min )
    Additional Resources
    Hi everyone, After an extended blackout, we've decided to reopen the sub since it became pretty clear that if we didn't then the admins would likely replace us and just reopen anyway. We know lots of you contacted us during the blackout trying to understand how to stay up to date with the latest ML research, news, and discussions. For that reason we are providing additional resources below that either exclusively focus on ml or often discuss ml: taggernews - an ml powered classifier for hackernews posts tagged as ai/ml lobste.rs/t/ai - posts tagged as ai on lobste.rs m/machinelearning - a nascent space for ml discussion on kbin You can also find a more thorough list of subs with additional resources here: https://sub.rehab If you have additional resources you think would be useful, please comment below and we can add them to the list. submitted by /u/dojoteef [link] [comments]  ( 8 min )
  • Open

    Is there a way to save sb3 model as torchscript?
    I trained a PPO model using sb3, but I am required to have the model as torchscript .pt file due to some limitations. Is there a way to convert my model into a torchscript model? submitted by /u/naughty_ningen [link] [comments]  ( 8 min )
    "Relating Neural Text Degeneration to Exposure Bias", Chiang & Chen 2021
    submitted by /u/gwern [link] [comments]  ( 8 min )
  • Open

    LLM Limitations and Hallucinations
    submitted by /u/Neurosymbolic [link] [comments]  ( 8 min )
    A C++ version of micrograd, the simplest autograd engine for neural nets.
    I implemented cpp-micrograd a C++ version of Karpathy's autograd engine.https://github.com/10-zin/cpp-micrograd if you find it interesting, please support the project with stars and contributions. Given the recent breakthrough of C/C++ versions of neural nets, like gerganov's llama.cpp, it made a lot of sense to build some neural nets with C/C++, hence cpp-micrograd. ​ Albeit a toy version, it gives a good understanding of how c++ would implement basic neural nets . IMO a very good start to understanding and using c/c++ neural nets, like ggml as no matter how complex the network, the basic autograd computation graph will always be the core. ​ The vision is to make a C implementation too, and then go all the way to launching Cuda kernels, while making it as educational as possible. ​ Here is what value you can get from the repo currently. If you love micrograd, but would wanna also have a cpp version for it. If you want a crisp backprop theory and annotated code of the autograd engine. If you wanna learn cpp by building neural nets, then this could be a good start (it was my purpose). Thanks for reading! submitted by /u/10zin_ [link] [comments]  ( 8 min )
    Open source NN
    Hey guys whwre can I get free or low cost trained neural networks? Say for IDing fish? submitted by /u/benjaminsmith1981 [link] [comments]  ( 8 min )
    The bigger-is-better approach to AI is running out of road
    submitted by /u/nickb [link] [comments]  ( 8 min )

  • Open

    Hello
    It's nice to see you here on reddit! :) submitted by /u/endrid [link] [comments]  ( 8 min )
    AI Blood in the City (workflow in the comments)
    submitted by /u/PeePeePeePooPooPooo [link] [comments]  ( 8 min )
    Byte-Sized Battles: Merging Philosophical Legends and Cutting-Edge AI Technology
    Greetings, fellow minds of r/artificial! Get ready to have your intellectual gears cranked up to maximum capacity. I bring to you an experimental podcast that combines the profound wisdom of past philosophers with the power of cutting-edge technology. Introducing the mind-bending podcast: Byte-Sized Battles Have you ever pondered what it'd be like to listen to Socrates and Descartes lock horns in a battle of ideas? Or perhaps you've envisioned Nietzsche and Simone de Beauvoir engaging in a gripping debate on the morality of lying? Prepare to have your curiosity satisfied! ‘Byte-Sized Battles’ resurrects these philosophical legends, allowing them to engage in captivating discussions through the magic of artificially generated voices. Now, before you raise an eyebrow and question the legitimacy of this groundbreaking concept, let's take a moment to delve into the fascinating questions it raises: Is it acceptable to create a podcast about philosophy using artificial intelligence? And more broadly, how should we navigate the vast sea of knowledge in this ever-evolving world? The beauty of ‘Byte-Sized Battles’ lies in its ability to showcase both the advantages and disadvantages of AI technology. Through engaging conversations, we hope to initiate a profound exploration of its potential role in the future of podcasting. Let's embark on this journey together! The texts are carefully crafted using ChatGPT, while the voices themselves are brought to life by Genny. We invite you to listen in and join the discussion. Share your thoughts, opinions, and musings—it's all part of the experience! 😊 submitted by /u/Lukas_Bjerg [link] [comments]  ( 9 min )
    I wonder what the NSA does with those petabytes of data every day.
    submitted by /u/katiecharm [link] [comments]  ( 8 min )
    AI to Solve Phone Scams
    I bet a portion of you reading this already know exactly what it contains based on the title. Let's make it happen! If you are not aware, there is an epidemic of phone scams occurring for the past several years. There are entire call centers, mostly located in India, dedicated to scamming American & European senior citizens out of hundreds to hundreds-of-thousands (choose any currency, though mostly in USD). The details on how the scams work vary, but one thing generally remains the same throughout: it is the victim that initiates a phone call to the scam call center to request "tech support", generally in order to remove malware from the victim's machine (malware creates a Windows pop-up that tells the victim that they have been infected by a virus, and provides the victim a phone numb…  ( 9 min )
    AI Faceswap Music Video
    submitted by /u/MoistGranniesASA [link] [comments]  ( 8 min )
    Can I learn AI from home to the point of being hired for it?
    I was laid off but have about 18 months of expense saved. I want to use this time to retrain in AI. I’d like to do it in 3-5 months though, since most jobs take a long time from hire to start. Please let me know what your recommendations would be? submitted by /u/Tasty-Window [link] [comments]  ( 8 min )
    One-Minute Daily AI News 6/23/2023
    Prime Minister Narendra Modi received a special t-shirt as a gift from Joe Biden on Friday which had his quote on AI printed on it - 'The future is AI - America & India'. PM Modi, during his address to the joint sitting of the US Congress, gave a new definition for AI - America and India.[1] A new generative AI tool(Opens in a new window) is helping designers in the Toyota Research Institute (TRI) get a head start on creating new vehicles.[2] Wimbledon is introducing AI-powered commentary to its coverage this year. The All England Club has teamed up with tech group IBM to offer AI-generated audio commentary and captions in its online highlights videos.[3] Over 1,200 computer hackers from around the world packed UC Berkeley’s Martin Luther King Jr. Student Union last weekend during a 36-hour AI learning language model (LLM) hackathon that Berkeley leaders say was the largest event of its kind.[4] Sources: [1] https://www.ndtv.com/india-news/joe-biden-gifts-special-t-shirt-to-pm-narendra-modi-with-quote-on-ai-america-india-4148271/amp/1 [2] https://www.pcmag.com/news/toyota-is-using-generative-ai-to-design-new-evs [3] https://amp.theguardian.com/sport/2023/jun/21/wimbledon-introduce-ai-powered-commentary-to-coverage-this-year [4] https://news.berkeley.edu/2023/06/22/uc-berkeley-cultivates-festive-culture-of-free-thinkers-at-ai-hackathon/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 8 min )
    Does anyone know of an affordable or free text to speech AI that can be used commercially?
    Title says it all. Thank u in advance submitted by /u/yea_okay_dude [link] [comments]  ( 8 min )
    Voice changer that generates unique voices?
    I have been searching for a free voice2voice converter, and the most promising ones are: voice.ai Retrieval Based Voice Conversion so-vits-svc Tortoise TTS What about creating original voices, though? Not sure if it is possible, but I thought of cloning two or more existing voices, then merging them together with audio editing software. Probably easier said than done. submitted by /u/Sculptor_THS [link] [comments]  ( 8 min )
  • Open

    Deep Reinforcement Learning Policies Learn Shared Adversarial Features across MDPs
    submitted by /u/ml_dnn [link] [comments]  ( 8 min )
    BEST Python Libraries when getting started in Machine Learning!
    submitted by /u/keghn [link] [comments]  ( 8 min )
    How do people handle mapping variable length arrays to inputs?
    I've been reading about the impressive AI they created for the digital version of dominion and they talk about mapping the card abilities to an embedded layer, which I think I understand but the part I'm struggling to get my mind around is how one would actually map a card game where the draw, discard and hand sizes change constantly? I've seen there's zero padding or recurrent neural networks but the diagrams (https://www.templegatesgames.com/dominion-ai/) don't look like RNNs and the article makes no mention of it and zero padding also seems a bit wild given the large upper limits? Every example for a neural network the scenario is something like image recognition or snake or something else that maps well to the fixed length inputs of NNs. I thought one possibility might be giving a neural network the average number of times a particular card ability appears in one of those arrays. So for instance, if the game had "draw 3 cards" as the highest draw effect possible, and we had one card with "draw 1 card" in our draw pile of 5 cards, we'd pass 1/15, assuming the other cards didn't draw at all. Does that sound reasonable? submitted by /u/lawrieee [link] [comments]  ( 9 min )
    Learning ANN
    Hello, I am a finance graduate student and want to learn how I can use ANN for volatility analysis of three stock market index for my homework. Could you please kindly advise some books or videos and tips? I want to learn, thanks in advances 🙌🏻 submitted by /u/here-for-reading [link] [comments]  ( 8 min )
    Please help: Getting an Value Error from using the mean squared error loss function
    I am training a feed-forward neural network that takes in the input of shape (4040,2) and output of shape (4040,4,51). I tried using the Panda data frame for it first, however, it does not seem to work with a 3-dimensional output shape. (If there is a way to covert 3-dimensional data into Panda data frame please let me know) So for now I am using numpy array to store the inputs and outputs. But they are now the same shape, and will raise and error when going through the mean squared error loss function. Any comments, solutions and suggestions are greatly appreciated! The code I have is the following: #using functional API def build_model90(): input_layer = Input(shape = repeated_norm_train_dime.shape) dense_1 = Dense(units='1024', activation='relu')(input_layer) dense_2 = Dense(units=…  ( 9 min )
    CNN & imbalanced data
    Hi, I've been trying to implement a CNN1d algorithm for binary classification on a dataset but it's heavily imbalanced (roughly 518 0s and 90 1s). I've been using cross validation to evaluate performance and I always seem to hit around ~85% accuracy but like 20ish for recall and precision. Are there any suggestions? I've tried SMOTE on training data after forming the x_train and such within the kfold loop, but doesn't lead to much success. The only time I've been able to reach high figures for recall and precision is when I've used SMOTE on the whole dataset by 'smt.fit_resample(X, Y)' but I guess that's causing data leakage. Thanks in advance. submitted by /u/acr15555 [link] [comments]  ( 8 min )
  • Open

    minimax for imperfect-information turn-games?
    How does one implement minimax strategies in Poker or similar imperfect-information turn-games? If you try to minimize your losses, you always have to assume the other agent has better cards than you, and you'd end up always folding, right? (well unless you do have the best hand possible but that would still lead to a sub-optimal over cautious strategy) ​ Does that mean minimax is only for perfect information turn-games? submitted by /u/Lindayz [link] [comments]  ( 8 min )
    RL for satisfying constraint for some percentage of times?
    Hello, I was wondering if there are RL algos or methods where I can have the agent satisfy a constraint a certain percentage of times? I am using RL for optimal control and as of now, I pass to the agent a mask that limits the actions, but I don't need the constraint to be strict and I would actually prefer the agent to have some more freedom to choose and hopefully it can find more space for optimization. Any resources would be great thanks submitted by /u/LeSUTHU [link] [comments]  ( 8 min )
    I want to build a learning agent for a combinatorial game
    I am working on building a learning agent for a combinatorial game called Clobber. We are currently using a technique by associating belief values with moves and doing some weighted average, it works well but I want to delve into some other techniques which may help improve the agent (say, work on larger board size). How and where should I make the enviornment? What sort of algorithms should I use? submitted by /u/eoliankeeper [link] [comments]  ( 8 min )
    RL and cognitive science
    Hello everyone, I’m looking for labs in Europe that does RL combine with neuro/ cognitive science. I know some labs in France doing that but not elsewhere. It’s for a PhD. Do you have any clues? Would greatly appreciate it submitted by /u/Jeffrosslostson [link] [comments]  ( 8 min )
    How to add values on a two-dimensional box in openai gym?
    Hello,I have an openAI gym observation box method with dimension (2,300) std::vector shape = { nWifi, history_length, }; Ptr> box = CreateObject>(shape); I want to add values at indexes 0 and 1 of the observation box. How can I do it? Right now, what I'm doing is the following box->AddValue(history[sta_id][i]), that is, adding values only on index 0 of the box, but I want to add values at indexes sta_id 0 and sta_id 1 submitted by /u/Status_Morning_3000 [link] [comments]  ( 8 min )
  • Open

    Numbers don’t typically have many prime factors
    Suppose you select a 100-digit number at random. How many distinct prime factors do you think it would have? The answer is smaller than you might think: most likely between 5 and 6. The function ω(n) returns the number of distinct prime factors of n [1]. A theorem of Hardy and Ramanujan says that as […] Numbers don’t typically have many prime factors first appeared on John D. Cook.  ( 5 min )
  • Open

    An algorithm for checking for palindrome numbers
    IntroductionChecking whether a number is a palindrome can be a common problem in programming, and there are different approaches to solving it. In this article, we will discuss an algorithm for checking if a number is a palindrome number. We will explain the steps involved in the algorithm, provide examples to illustrate it, and explore… Read More »An algorithm for checking for palindrome numbers The post An algorithm for checking for palindrome numbers appeared first on Data Science Central.  ( 21 min )

  • Open

    Research Focus: Week of June 19, 2023
    In this issue: Our new Responsible AI Maturity Model; FoundWright helps “re-find” web pages; Trace-guided Inductive Synthesis of Recursive Functional Programs; a wait-free algorithm for weak reference counting; and new research on concurrency testing. The post Research Focus: Week of June 19, 2023 appeared first on Microsoft Research.  ( 11 min )
  • Open

    Projected $200,000 in Prizes AI Hackathon (Free to Enter)
    Hey everyone, We're hosting a free-to-enter, virtual hackathon, All Inclusive Hacks, centered on fostering creative solutions to strengthen inclusivity in AI development. It will be hosted during mid to late August and we expect the prize pool to be in the range of $200k to $300k and to be sponsored by major tech corporations, including Google, OpenAI(creator of ChatGPT), and Microsoft. This is open to all aged 13+ regardless of experience in programming. We will have workshops to provide educational resources for our participants. Why Should You Join This Hackathon? This hackathon is a great opportunity to... Earn loads of prizes Meet peers with similar interests Improve one's resume for jobs, internships, and college or grad school admissions Improve one's grasp of artificial intelligence(this especially important due to the increasing prevalence of AI in the industry) Learn from our workshops that will be hosted by leading figures in artificial intelligence Sign Up Link: https://all-inclusive-hacks.devpost.com/ submitted by /u/AllInclusiveHacks [link] [comments]  ( 8 min )
    Any Deep ReLU Network is Shallow
    submitted by /u/nickb [link] [comments]  ( 8 min )
    What Is a Transformer Model?
    submitted by /u/nickb [link] [comments]  ( 8 min )
    Mr kitty - After Dark [Neural Net v27.00]
    submitted by /u/thebeardedmojo [link] [comments]  ( 8 min )
  • Open

    Accelerate time to business insights with the Amazon SageMaker Data Wrangler direct connection to Snowflake
    Amazon SageMaker Data Wrangler is a single visual interface that reduces the time required to prepare data and perform feature engineering from weeks to minutes with the ability to select and clean data, create features, and automate data preparation in machine learning (ML) workflows without writing any code. SageMaker Data Wrangler supports Snowflake, a popular […]  ( 12 min )
    Deploy a serverless ML inference endpoint of large language models using FastAPI, AWS Lambda, and AWS CDK
    For data scientists, moving machine learning (ML) models from proof of concept to production often presents a significant challenge. One of the main challenges can be deploying a well-performing, locally trained model to the cloud for inference and use in other applications. It can be cumbersome to manage the process, but with the right tool, […]  ( 10 min )
  • Open

    Masters thesis about improving an implementation
    I'm a masters student researching RL for a niche application. At the start of my program, I encountered a paper that was very interesting to me, and I was determined to expand on the work. I quickly found that their code was challenging to understand, because code quality was not their focus. It uses messy TF 1.x, often reinvents the wheel, and it is not optimized for performance. I ended up re-implementing the code from scratch, trying my best to follow community standards and good coding practices, which was a painstaking yet insightful journey. I made various improvements, e.g. I was able to decrease training time from a few days to a couple of hours. In my opinion, what I've done can be helpful for other researchers in this domain. However, my advisor is skeptical that my contributions are novel enough for the scientific community, especially since he would like me to eventually publish a paper. I see where he's coming from, as this is more so engineering than science. His background is in OR, not ML/RL, so I'm interested to hear some of your thoughts, r/reinforcementlearning. Is this kind of work useful? What's a good way for me to package it? Cheers! submitted by /u/-gold-panda- [link] [comments]  ( 8 min )
    PPO ignores high rewards in deterministic sytem
    Hello, ​ Please excuse my naivete and the fact I dont really seem to understand hw on-policy learning works. Check this training out (x-episodes, y axis cumulative reward max is 400) One thing that I see is that critic may be overfitting to strongly underestimate state values? ​ https://preview.redd.it/vy14cw5deq7b1.png?width=190&format=png&auto=webp&s=20090c3ff2fcad0538f5b6fee3f08e21fa41b284 ​ You see - my environment/system is deterministic (and I have exploration policy in the beginning) yet I cant get my agent to reproduce the steps it did to receive high rewards in the beginning instead it gets locked in sub-optimal reward behaviour, here around ~0 but sometimes it can be in the middle, it never reproduces high-reward actions. ​ I tried multiplying loss with the value of reward or changing hyperparameters, learning rate - this just seems to get stuck even quicker. How can I modify my PPO to be less oblivious to this high reward episodes? I calculate my loss like this: ratios = torch.exp(policy - policy_old) surr1 = ratios * advantages surr2 = torch.clamp(ratios, 1.0 - clip_epsilon, 1.0 + clip_epsilon) * advantages actor_loss = -torch.min(surr1, surr2).mean() Where advantages is tensor with discount advantages for each step submitted by /u/Vae94 [link] [comments]  ( 9 min )
  • Open

    Am I crazy or does this look AI generated?
    submitted by /u/Mayo4Life [link] [comments]  ( 8 min )
    AI chatbot specially designed for 'counter-trolling' or 'counter-cyberbullying'?
    I am sure this has been discussed for a while but has anyone actually deployed such thing? So basically the AI is smart enought to analyze the trolls and bullies then perform counterattacks to them. Well, it doesn't just analyze the posting history of them. It also analyzes the visual information of them. It is also capable of scan the web for relevant information about them to use it against them. I am not sure if this is within legal boundary though. ​ submitted by /u/Absolute-Nobody0079 [link] [comments]  ( 8 min )
    🎨 Midjourney v5.2 is truly mind0-blowing.
    Midjourney, an AI image generator, has released its latest version 5.2 with exciting new features and is mind-blowing. The "Zoom Out" feature allows users to expand the field of view, with options to zoom out by 1.5x and 2x. Additionally, the update includes a "Make Square" option and a "Custom Zoom" tool for altering text prompts and aspect ratios. The new version also introduces a "shorten command" option, enabling users to analyze the impact of words on the generated image. In tests, Midjourney performed well with zooming out but had difficulties with changing aspect ratios. Midjourney v5.2 is accessible through a Discord channel and pricing plans. Worth playing with it. check it out: yarocelis.org submitted by /u/Yavero [link] [comments]  ( 8 min )
    AI — weekly megathread!
    This week in AI - partnered with aibrews.com feel free to follow their newsletter News & Insights Stability AI has announced SDXL 0.9, a significant upgrade to their text-to-image model suite that can generate hyper-realistic images. SDXL 0.9 has one of the largest parameter counts in open-source image models (3.5B) and is available on the Clipdrop by Stability AI platform [Details]. Google presents AudioPaLM, a Large Language Model that can speak and listen. AudioPaLM fuses text-based PaLM-2 and speech-based AudioLM models into a unified multimodal architecture that can process and generate text and speech [Examples | paper]. Google researchers present DreamHuman, a method to generate realistic animatable 3D human avatar models solely from textual descriptions [Details]. Meta intro…  ( 10 min )
    Intel Discloses New Details On Meteor Lake VPU Block, Lays Out Vision For Client AI
    submitted by /u/eberkut [link] [comments]  ( 8 min )
    Any good AI for presentation generation?
    I have tried 15 different AI tools yesterday such as tome or magicwizard but all of them are "ok" at best. None can make adjustments to the generated presentation with prompts, it has to be done manually which defies the purpose. So, do any of you know of a good AI that makes great presentations and can make adjustments based on prompts? Thank you! Edit: before anyone suggests it, yes i tried magicslides and it's not that great. Also it would be great if you can give the AI documents like pdf's and it makes presentations from that content. submitted by /u/Rule34withRule16 [link] [comments]  ( 8 min )
    How AI can distort human beliefs
    submitted by /u/bartturner [link] [comments]  ( 8 min )
    Any current AI Scientists that I can ask a few questions to?
    I am a career changer from London transitioning into AI (from econ and math). I am planning to do a masters in the states and I keep hearing the US job market is a dream compared to the UK one. As well as other stuff which I dont know how to verify since I am an outsider both in terms of country and industry. Are there any current AI scientists who would be willing to answer a few questions? I keep hearing the US is a much better job and experience market. Is this true? On a similar note - any specific uni or graduate degree or masters that is particularly respected in the industry? For example - Columbia MFE is really good for entering into buy side finance (an analogy I am familiar with because of my econ background), but I am not sure if there are similar target unis for tech/ ai/ FAANG too? My plan is to become an AI scientist (mainly on the corporate side) and thats what I am working towards. Cheers to anyone who responds, you have my blessings and I will treat you to food if we ever meetup! submitted by /u/Icy-Bid-5585 [link] [comments]  ( 8 min )
    China: Artificial Intelligence Searches for Rare Earths
    Looks like artificial intelligence also helps with mining. Interesting to see, maybe also in oil and gas? 🤔 from what I know from oil workers in my family is that the search part takes ages while the extracting part is rather simple submitted by /u/easyskinseasylife [link] [comments]  ( 8 min )
    One-Minute Daily AI News 6/22/2023
    DeepMind latest paper introduces a self-improving AI agent for robotics, RoboCat, that learns to perform a variety of tasks across different arms, and then self-generates new training data to improve its technique.[1] OpenAI has lobbied for significant elements of the most comprehensive AI legislation in the world—the E.U.’s AI Act—to be watered down in ways that would reduce the regulatory burden on the company.[2] In an apparent bid to assert its presence in the rapidly expanding AI landscape, Amazon Web Services (AWS)—the retail giant’s sizable cloud computing arm—has introduced a fund of $100 million to bolster startups focusing on generative AI.[3] Over the past year, more than 100,000 login credentials to the popular artificial intelligence chatbot ChatGPT have been leaked and traded on the dark web, according to a Singaporean cybersecurity firm.[4] Sources: [1] https://www.deepmind.com/blog/robocat-a-self-improving-robotic-agent [2] https://time.com/6288245/openai-eu-lobbying-ai-act/ [3] https://decrypt.co/145879/amazon-pledges-100-million-generative-ai-startups [4] https://cointelegraph.com/news/dark-web-chatgpt-logins-leaked-group-ib-open-ai submitted by /u/Excellent-Target-847 [link] [comments]  ( 8 min )
  • Open

    Preference learning with automated feedback for cache eviction
    Posted by Ramki Gummadi, Software Engineer, Google and Kevin Chen, Software Engineer, YouTube Caching is a ubiquitous idea in computer science that significantly improves the performance of storage and retrieval systems by storing a subset of popular items closer to the client based on request patterns. An important algorithmic piece of cache management is the decision policy used for dynamically updating the set of items being stored, which has been extensively optimized over several decades, resulting in several efficient and robust heuristics. While applying machine learning to cache policies has shown promising results in recent years (e.g., LRB, LHD, storage applications), it remains a challenge to outperform robust heuristics in a way that can generalize reliably beyond benchmar…  ( 93 min )
  • Open

    Every factorial is a power
    The previous post mentioned that 24! ≈ 1024 and 25! ≈ 1025. For every n, there is some base b such that n! = bn. For example, 30! ≈ 1230. It’s easy to find b: What’s interesting is that b is very nearly a linear function of n. In hindsight it’s clear that this should […] Every factorial is a power first appeared on John D. Cook.  ( 5 min )
  • Open

    Deep Learning Digs Deep: AI Unveils New Large-Scale Images in Peruvian Desert
    Researchers at Yamagata University in Japan have harnessed AI to uncover four previously unseen geoglyphs — images on the ground, some as wide as 1,200 feet, made using the land’s elements — in Nazca, a seven-hour drive south of Lima, Peru. The geoglyphs — a humanoid, a pair of legs, a fish and a bird Read article >  ( 4 min )
  • Open

    scikit-fda: A Python Package for Functional Data Analysis. (arXiv:2211.02566v2 [stat.CO] UPDATED)
    The library scikit-fda is a Python package for Functional Data Analysis (FDA). It provides a comprehensive set of tools for representation, preprocessing, and exploratory analysis of functional data. The library is built upon and integrated in Python's scientific ecosystem. In particular, it conforms to the scikit-learn application programming interface so as to take advantage of the functionality for machine learning provided by this package: pipelines, model selection, and hyperparameter tuning, among others. The scikit-fda package has been released as free and open-source software under a 3-Clause BSD license and is open to contributions from the FDA community. The library's extensive documentation includes step-by-step tutorials and detailed examples of use.
    Decentralized Online Federated G-Network Learning for Lightweight Intrusion Detection. (arXiv:2306.13029v1 [cs.CR])
    Cyberattacks are increasingly threatening networked systems, often with the emergence of new types of unknown (zero-day) attacks and the rise of vulnerable devices. While Machine Learning (ML)-based Intrusion Detection Systems (IDSs) have been shown to be extremely promising in detecting these attacks, the need to learn large amounts of labelled data often limits the applicability of ML-based IDSs to cybersystems that only have access to private local data. To address this issue, this paper proposes a novel Decentralized and Online Federated Learning Intrusion Detection (DOF-ID) architecture. DOF-ID is a collaborative learning system that allows each IDS used for a cybersystem to learn from experience gained in other cybersystems in addition to its own local data without violating the data privacy of other systems. As the performance evaluation results using public Kitsune and Bot-IoT datasets show, DOF-ID significantly improves the intrusion detection performance in all collaborating nodes simultaneously with acceptable computation time for online learning.
    Explainable Representations for Relation Prediction in Knowledge Graphs. (arXiv:2306.12687v1 [cs.LG])
    Knowledge graphs represent real-world entities and their relations in a semantically-rich structure supported by ontologies. Exploring this data with machine learning methods often relies on knowledge graph embeddings, which produce latent representations of entities that preserve structural and local graph neighbourhood properties, but sacrifice explainability. However, in tasks such as link or relation prediction, understanding which specific features better explain a relation is crucial to support complex or critical applications. We propose SEEK, a novel approach for explainable representations to support relation prediction in knowledge graphs. It is based on identifying relevant shared semantic aspects (i.e., subgraphs) between entities and learning representations for each subgraph, producing a multi-faceted and explainable representation. We evaluate SEEK on two real-world highly complex relation prediction tasks: protein-protein interaction prediction and gene-disease association prediction. Our extensive analysis using established benchmarks demonstrates that SEEK achieves significantly better performance than standard learning representation methods while identifying both sufficient and necessary explanations based on shared semantic aspects.
    Frouros: A Python library for drift detection in machine learning systems. (arXiv:2208.06868v2 [cs.LG] UPDATED)
    Frouros is an open-source Python library capable of detecting drift in machine learning systems. It provides a combination of classical and more recent algorithms for drift detection: both concept and data drift. We have designed it with the objective of making it compatible with any machine learning framework and easily adaptable to real-world use cases. The library is developed following a set of best development and continuous integration practices to ensure ease of maintenance and extensibility. The source code is available at https://github.com/IFCA/frouros.
    SQ Lower Bounds for Learning Bounded Covariance GMMs. (arXiv:2306.13057v1 [cs.LG])
    We study the complexity of learning mixtures of separated Gaussians with common unknown bounded covariance matrix. Specifically, we focus on learning Gaussian mixture models (GMMs) on $\mathbb{R}^d$ of the form $P= \sum_{i=1}^k w_i \mathcal{N}(\boldsymbol \mu_i,\mathbf \Sigma_i)$, where $\mathbf \Sigma_i = \mathbf \Sigma \preceq \mathbf I$ and $\min_{i \neq j} \| \boldsymbol \mu_i - \boldsymbol \mu_j\|_2 \geq k^\epsilon$ for some $\epsilon>0$. Known learning algorithms for this family of GMMs have complexity $(dk)^{O(1/\epsilon)}$. In this work, we prove that any Statistical Query (SQ) algorithm for this problem requires complexity at least $d^{\Omega(1/\epsilon)}$. In the special case where the separation is on the order of $k^{1/2}$, we additionally obtain fine-grained SQ lower bounds with the correct exponent. Our SQ lower bounds imply similar lower bounds for low-degree polynomial tests. Conceptually, our results provide evidence that known algorithms for this problem are nearly best possible.
    MultiTASC: A Multi-Tenancy-Aware Scheduler for Cascaded DNN Inference at the Consumer Edge. (arXiv:2306.12830v1 [cs.LG])
    Cascade systems comprise a two-model sequence, with a lightweight model processing all samples and a heavier, higher-accuracy model conditionally refining harder samples to improve accuracy. By placing the light model on the device side and the heavy model on a server, model cascades constitute a widely used distributed inference approach. With the rapid expansion of intelligent indoor environments, such as smart homes, the new setting of Multi-Device Cascade is emerging where multiple and diverse devices are to simultaneously use a shared heavy model on the same server, typically located within or close to the consumer environment. This work presents MultiTASC, a multi-tenancy-aware scheduler that adaptively controls the forwarding decision functions of the devices in order to maximize the system throughput, while sustaining high accuracy and low latency. By explicitly considering device heterogeneity, our scheduler improves the latency service-level objective (SLO) satisfaction rate by 20-25 percentage points (pp) over state-of-the-art cascade methods in highly heterogeneous setups, while serving over 40 devices, showcasing its scalability.
    Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model. (arXiv:2306.12968v1 [cs.SI])
    We consider the problem of recovering hidden communities in the Labeled Stochastic Block Model (LSBM) with a finite number of clusters, where cluster sizes grow linearly with the total number $n$ of items. In the LSBM, a label is (independently) observed for each pair of items. Our objective is to devise an efficient algorithm that recovers clusters using the observed labels. To this end, we revisit instance-specific lower bounds on the expected number of misclassified items satisfied by any clustering algorithm. We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability. IAC consists of a one-time spectral clustering algorithm followed by an iterative likelihood-based cluster assignment improvement. This approach is based on the instance-specific lower bound and does not require any model parameters, including the number of clusters. By performing the spectral clustering only once, IAC maintains an overall computational complexity of $\mathcal{O}(n \text{polylog}(n))$. We illustrate the effectiveness of our approach through numerical experiments.
    Corrector Operator to Enhance Accuracy and Reliability of Neural Operator Surrogates of Nonlinear Variational Boundary-Value Problems. (arXiv:2306.12047v2 [math.NA] UPDATED)
    This work focuses on developing methods for approximating the solution operators of a class of parametric partial differential equations via neural operators. Neural operators have several challenges, including the issue of generating appropriate training data, cost-accuracy trade-offs, and nontrivial hyperparameter tuning. The unpredictability of the accuracy of neural operators impacts their applications in downstream problems of inference, optimization, and control. A framework is proposed based on the linear variational problem that gives the correction to the prediction furnished by neural operators. The operator associated with the corrector problem is referred to as the corrector operator. Numerical results involving a nonlinear diffusion model in two dimensions with PCANet-type neural operators show almost two orders of increase in the accuracy of approximations when neural operators are corrected using the proposed scheme. Further, topology optimization involving a nonlinear diffusion model is considered to highlight the limitations of neural operators and the efficacy of the correction scheme. Optimizers with neural operator surrogates are seen to make significant errors (as high as 80 percent). However, the errors are much lower (below 7 percent) when neural operators are corrected following the proposed method.
    Fitted Value Iteration Methods for Bicausal Optimal Transport. (arXiv:2306.12658v1 [stat.ML])
    We develop a fitted value iteration (FVI) method to compute bicausal optimal transport (OT) where couplings have an adapted structure. Based on the dynamic programming formulation, FVI adopts a function class to approximate the value functions in bicausal OT. Under the concentrability condition and approximate completeness assumption, we prove the sample complexity using (local) Rademacher complexity. Furthermore, we demonstrate that multilayer neural networks with appropriate structures satisfy the crucial assumptions required in sample complexity proofs. Numerical experiments reveal that FVI outperforms linear programming and adapted Sinkhorn methods in scalability as the time horizon increases, while still maintaining acceptable accuracy.
    Quantum Pufferfish Privacy: A Flexible Privacy Framework for Quantum Systems. (arXiv:2306.13054v1 [quant-ph])
    We propose a versatile privacy framework for quantum systems, termed quantum pufferfish privacy (QPP). Inspired by classical pufferfish privacy, our formulation generalizes and addresses limitations of quantum differential privacy by offering flexibility in specifying private information, feasible measurements, and domain knowledge. We show that QPP can be equivalently formulated in terms of the Datta-Leditzky information spectrum divergence, thus providing the first operational interpretation thereof. We reformulate this divergence as a semi-definite program and derive several properties of it, which are then used to prove convexity, composability, and post-processing of QPP mechanisms. Parameters that guarantee QPP of the depolarization mechanism are also derived. We analyze the privacy-utility tradeoff of general QPP mechanisms and, again, study the depolarization mechanism as an explicit instance. The QPP framework is then applied to privacy auditing for identifying privacy violations via a hypothesis testing pipeline that leverages quantum algorithms. Connections to quantum fairness and other quantum divergences are also explored and several variants of QPP are examined.
    ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings. (arXiv:2305.11554v2 [cs.CL] UPDATED)
    Augmenting large language models (LLMs) with external tools has emerged as a promising approach to solving complex problems. However, traditional methods, which finetune LLMs with tool demonstration data, can be both costly and restricted to a predefined set of tools. Recent in-context learning paradigm alleviates these issues, but the limited context length only allows for a few shots of demonstrations, leading to suboptimal understandings of the tools. Moreover, when there are numerous tools to choose from, in-context learning could completely fail to work. In this paper, we propose an alternative approach, $\textbf{ToolkenGPT}$, which combines the benefits of both sides. Our approach represents each $\underline{tool}$ as a to$\underline{ken}$ ($\textit{toolken}$) and learns an embedding for it, enabling tool calls in the same way as generating a regular word token. Once a toolken is triggered, the LLM is prompted to complete arguments for the tool to execute. ToolkenGPT offers the flexibility to plug in an arbitrary number of tools by expanding the set of toolkens on the fly. In addition, it improves tool use by allowing extensive demonstration data for learning the toolken embeddings. In diverse domains, including numerical reasoning, knowledge-based question answering, and embodied plan generation, our approach effectively augments LLMs with tools and substantially outperforms various latest baselines. ToolkenGPT demonstrates the promising ability to use relevant tools from a large tool set in complex scenarios.
    Efficient Deep Spiking Multi-Layer Perceptrons with Multiplication-Free Inference. (arXiv:2306.12465v1 [cs.NE])
    Advancements in adapting deep convolution architectures for Spiking Neural Networks (SNNs) have significantly enhanced image classification performance and reduced computational burdens. However, the inability of Multiplication-Free Inference (MFI) to harmonize with attention and transformer mechanisms, which are critical to superior performance on high-resolution vision tasks, imposes limitations on these gains. To address this, our research explores a new pathway, drawing inspiration from the progress made in Multi-Layer Perceptrons (MLPs). We propose an innovative spiking MLP architecture that uses batch normalization to retain MFI compatibility and introduces a spiking patch encoding layer to reinforce local feature extraction capabilities. As a result, we establish an efficient multi-stage spiking MLP network that effectively blends global receptive fields with local feature extraction for comprehensive spike-based computation. Without relying on pre-training or sophisticated SNN training techniques, our network secures a top-1 accuracy of 66.39% on the ImageNet-1K dataset, surpassing the directly trained spiking ResNet-34 by 2.67%. Furthermore, we curtail computational costs, model capacity, and simulation steps. An expanded version of our network challenges the performance of the spiking VGG-16 network with a 71.64% top-1 accuracy, all while operating with a model capacity 2.1 times smaller. Our findings accentuate the potential of our deep SNN architecture in seamlessly integrating global and local learning abilities. Interestingly, the trained receptive field in our network mirrors the activity patterns of cortical cells.
    Siamese SIREN: Audio Compression with Implicit Neural Representations. (arXiv:2306.12957v1 [cs.SD])
    Implicit Neural Representations (INRs) have emerged as a promising method for representing diverse data modalities, including 3D shapes, images, and audio. While recent research has demonstrated successful applications of INRs in image and 3D shape compression, their potential for audio compression remains largely unexplored. Motivated by this, we present a preliminary investigation into the use of INRs for audio compression. Our study introduces Siamese SIREN, a novel approach based on the popular SIREN architecture. Our experimental results indicate that Siamese SIREN achieves superior audio reconstruction fidelity while utilizing fewer network parameters compared to previous INR architectures.
    The False Dawn: Reevaluating Google's Reinforcement Learning for Chip Macro Placement. (arXiv:2306.09633v3 [cs.LG] CROSS LISTED)
    Reinforcement learning (RL) for physical design of silicon chips in a Google 2021 Nature paper stirred controversy due to poorly documented claims that raised eyebrows and attracted critical media coverage. The Nature paper withheld most inputs needed to produce reported results and some critical steps in the methodology. But two separate evaluations filled in the gaps and demonstrated that Google RL lags behind human designers, behind a well-known algorithm (Simulated Annealing), and also behind generally-available commercial software. Crosschecked data indicate that the integrity of the Nature paper is substantially undermined owing to errors in the conduct, analysis and reporting.
    Text-Driven Foley Sound Generation With Latent Diffusion Model. (arXiv:2306.10359v2 [cs.SD] UPDATED)
    Foley sound generation aims to synthesise the background sound for multimedia content. Previous models usually employ a large development set with labels as input (e.g., single numbers or one-hot vector). In this work, we propose a diffusion model based system for Foley sound generation with text conditions. To alleviate the data scarcity issue, our model is initially pre-trained with large-scale datasets and fine-tuned to this task via transfer learning using the contrastive language-audio pertaining (CLAP) technique. We have observed that the feature embedding extracted by the text encoder can significantly affect the performance of the generation model. Hence, we introduce a trainable layer after the encoder to improve the text embedding produced by the encoder. In addition, we further refine the generated waveform by generating multiple candidate audio clips simultaneously and selecting the best one, which is determined in terms of the similarity score between the embedding of the candidate clips and the embedding of the target text label. Using the proposed method, our system ranks ${1}^{st}$ among the systems submitted to DCASE Challenge 2023 Task 7. The results of the ablation studies illustrate that the proposed techniques significantly improve sound generation performance. The codes for implementing the proposed system are available online.
    Beyond OOD State Actions: Supported Cross-Domain Offline Reinforcement Learning. (arXiv:2306.12755v1 [cs.LG])
    Offline reinforcement learning (RL) aims to learn a policy using only pre-collected and fixed data. Although avoiding the time-consuming online interactions in RL, it poses challenges for out-of-distribution (OOD) state actions and often suffers from data inefficiency for training. Despite many efforts being devoted to addressing OOD state actions, the latter (data inefficiency) receives little attention in offline RL. To address this, this paper proposes the cross-domain offline RL, which assumes offline data incorporate additional source-domain data from varying transition dynamics (environments), and expects it to contribute to the offline data efficiency. To do so, we identify a new challenge of OOD transition dynamics, beyond the common OOD state actions issue, when utilizing cross-domain offline data. Then, we propose our method BOSA, which employs two support-constrained objectives to address the above OOD issues. Through extensive experiments in the cross-domain offline RL setting, we demonstrate BOSA can greatly improve offline data efficiency: using only 10\% of the target data, BOSA could achieve {74.4\%} of the SOTA offline RL performance that uses 100\% of the target data. Additionally, we also show BOSA can be effortlessly plugged into model-based offline RL and noising data augmentation techniques (used for generating source-domain data), which naturally avoids the potential dynamics mismatch between target-domain data and newly generated source-domain data.
    Multi-Task Learning with Loop Specific Attention for CDR Structure Prediction. (arXiv:2306.13045v1 [cs.LG])
    The Complementarity Determining Region (CDR) structure prediction of loops in antibody engineering has gained a lot of attraction by researchers. When designing antibodies, a main challenge is to predict the CDR structure of the H3 loop. Compared with the other CDR loops, that is the H1 and H2 loops, the CDR structure of the H3 loop is more challenging due to its varying length and flexible structure. In this paper, we propose a Multi-task learning model with Loop Specific Attention, namely MLSA. In particular, to the best of our knowledge we are the first to jointly learn the three CDR loops, via a novel multi-task learning strategy. In addition, to account for the structural and functional similarities and differences of the three CDR loops, we propose a loop specific attention mechanism to control the influence of each CDR loop on the training of MLSA. Our experimental evaluation on widely used benchmark data shows that the proposed MLSA method significantly reduces the prediction error of the CDR structure of the H3 loop, by at least 19%, when compared with other baseline strategies. Finally, for reproduction purposes we make the implementation of MLSA publicly available at https://anonymous.4open.science/r/MLSA-2442/.
    Investigating the effect of sub-word segmentation on the performance of transformer language models. (arXiv:2305.05480v2 [cs.CL] UPDATED)
    We would like to explore how morphemes can affect the performance of a language model. We trained GPT-2 and Bert model with StateMorph for both Finnish and Russian, which is a morpheme segmenting algorithm. As a comparison, we also trained a model with BPE and Morfessor. Our preliminary result shows that StateMorph can help the model to converge more efficiently and achieve a better validation score.
    PyKoopman: A Python Package for Data-Driven Approximation of the Koopman Operator. (arXiv:2306.12962v1 [eess.SY])
    PyKoopman is a Python package for the data-driven approximation of the Koopman operator associated with a dynamical system. The Koopman operator is a principled linear embedding of nonlinear dynamics and facilitates the prediction, estimation, and control of strongly nonlinear dynamics using linear systems theory. In particular, PyKoopman provides tools for data-driven system identification for unforced and actuated systems that build on the equation-free dynamic mode decomposition (DMD) and its variants. In this work, we provide a brief description of the mathematical underpinnings of the Koopman operator, an overview and demonstration of the features implemented in PyKoopman (with code examples), practical advice for users, and a list of potential extensions to PyKoopman. Software is available at this http URL
    Taming Multi-Agent Reinforcement Learning with Estimator Variance Reduction. (arXiv:2209.01054v2 [cs.MA] UPDATED)
    Centralised training with decentralised execution (CT-DE) serves as the foundation of many leading multi-agent reinforcement learning (MARL) algorithms. Despite its popularity, it suffers from a critical drawback due to its reliance on learning from a single sample of the joint-action at a given state. As agents explore and update their policies during training, these single samples may poorly represent the actual joint-policy of the system of agents leading to high variance gradient estimates that hinder learning. To address this problem, we propose an enhancement tool that accommodates any actor-critic MARL method. Our framework, Performance Enhancing Reinforcement Learning Apparatus (PERLA), introduces a sampling technique of the agents' joint-policy into the critics while the agents train. This leads to TD updates that closely approximate the true expected value under the current joint-policy rather than estimates from a single sample of the joint-action at a given state. This produces low variance and precise estimates of expected returns, minimising the variance in the critic estimators which typically hinders learning. Moreover, as we demonstrate, by eliminating much of the critic variance from the single sampling of the joint policy, PERLA enables CT-DE methods to scale more efficiently with the number of agents. Theoretically, we prove that PERLA reduces variance in value estimates similar to that of decentralised training while maintaining the benefits of centralised training. Empirically, we demonstrate PERLA's superior performance and ability to reduce estimator variance in a range of benchmarks including Multi-agent Mujoco, and StarCraft II Multi-agent Challenge.
    Precision psychiatry: predicting predictability. (arXiv:2306.12462v1 [q-bio.QM])
    Precision psychiatry is an ermerging field that aims to provide individualized approaches to mental health care. Multivariate analysis and machine learning are used to create outcome prediction models based on clinical data such as demographics, symptom assessments, genetic information, and brain imaging. While much emphasis has been placed on technical innovation, the complex and varied nature of mental health presents significant challenges to the successful implementation of these models. From this perspective, I review ten challenges in the field of precision psychiatry, including the need for studies on real-world populations and realistic clinical outcome definitions, consideration of treatment-related factors such as placebo effects and non-adherence to prescriptions. Fairness, prospective validation in comparison to current practice and implementation studies of prediction models are other key issues that are currently understudied. A shift is proposed from retrospective studies based on linear and static concepts of disease towards prospective research that considers the importance of contextual factors and the dynamic and complex nature of mental health.
    Data augmentation for recommender system: A semi-supervised approach using maximum margin matrix factorization. (arXiv:2306.13050v1 [cs.IR])
    Collaborative filtering (CF) has become a popular method for developing recommender systems (RS) where ratings of a user for new items is predicted based on her past preferences and available preference information of other users. Despite the popularity of CF-based methods, their performance is often greatly limited by the sparsity of observed entries. In this study, we explore the data augmentation and refinement aspects of Maximum Margin Matrix Factorization (MMMF), a widely accepted CF technique for the rating predictions, which have not been investigated before. We exploit the inherent characteristics of CF algorithms to assess the confidence level of individual ratings and propose a semi-supervised approach for rating augmentation based on self-training. We hypothesize that any CF algorithm's predictions with low confidence are due to some deficiency in the training data and hence, the performance of the algorithm can be improved by adopting a systematic data augmentation strategy. We iteratively use some of the ratings predicted with high confidence to augment the training data and remove low-confidence entries through a refinement process. By repeating this process, the system learns to improve prediction accuracy. Our method is experimentally evaluated on several state-of-the-art CF algorithms and leads to informative rating augmentation, improving the performance of the baseline approaches.
    \{kappa}HGCN: Tree-likeness Modeling via Continuous and Discrete Curvature Learning. (arXiv:2212.01793v2 [cs.LG] UPDATED)
    The prevalence of tree-like structures, encompassing hierarchical structures and power law distributions, exists extensively in real-world applications, including recommendation systems, ecosystems, financial networks, social networks, etc. Recently, the exploitation of hyperbolic space for tree-likeness modeling has garnered considerable attention owing to its exponential growth volume. Compared to the flat Euclidean space, the curved hyperbolic space provides a more amenable and embeddable room, especially for datasets exhibiting implicit tree-like architectures. However, the intricate nature of real-world tree-like data presents a considerable challenge, as it frequently displays a heterogeneous composition of tree-like, flat, and circular regions. The direct embedding of such heterogeneous structures into a homogeneous embedding space (i.e., hyperbolic space) inevitably leads to heavy distortions. To mitigate the aforementioned shortage, this study endeavors to explore the curvature between discrete structure and continuous learning space, aiming at encoding the message conveyed by the network topology in the learning process, thereby improving tree-likeness modeling. To the end, a curvature-aware hyperbolic graph convolutional neural network, \{kappa}HGCN, is proposed, which utilizes the curvature to guide message passing and improve long-range propagation. Extensive experiments on node classification and link prediction tasks verify the superiority of the proposal as it consistently outperforms various competitive models by a large margin.
    Exploring Antitrust and Platform Power in Generative AI. (arXiv:2306.11342v2 [cs.LG] UPDATED)
    The concentration of power in a few digital technology companies has become a subject of increasing interest in both academic and non-academic discussions. One of the most noteworthy contributions to the debate is Lina Khan's Amazon's Antitrust Paradox. In this work, Khan contends that Amazon has systematically exerted its dominance in online retail to eliminate competitors and subsequently charge above-market prices. This work contributed to Khan's appointment as the chair of the US Federal Trade Commission (FTC), one of the most influential antitrust organizations. Today, several ongoing antitrust lawsuits in the US and Europe involve major technology companies like Apple, Google/Alphabet, and Facebook/Meta. In the realm of generative AI, we are once again witnessing the same companies taking the lead in technological advancements, leaving little room for others to compete. This article examines the market dominance of these corporations in the technology stack behind generative AI from an antitrust law perspective.
    Learnability and Algorithm for Continual Learning. (arXiv:2306.12646v1 [cs.LG])
    This paper studies the challenging continual learning (CL) setting of Class Incremental Learning (CIL). CIL learns a sequence of tasks consisting of disjoint sets of concepts or classes. At any time, a single model is built that can be applied to predict/classify test instances of any classes learned thus far without providing any task related information for each test instance. Although many techniques have been proposed for CIL, they are mostly empirical. It has been shown recently that a strong CIL system needs a strong within-task prediction (WP) and a strong out-of-distribution (OOD) detection for each task. However, it is still not known whether CIL is actually learnable. This paper shows that CIL is learnable. Based on the theory, a new CIL algorithm is also proposed. Experimental results demonstrate its effectiveness.
    Evolving Computation Graphs. (arXiv:2306.12943v1 [cs.LG])
    Graph neural networks (GNNs) have demonstrated success in modeling relational data, especially for data that exhibits homophily: when a connection between nodes tends to imply that they belong to the same class. However, while this assumption is true in many relevant situations, there are important real-world scenarios that violate this assumption, and this has spurred research into improving GNNs for these cases. In this work, we propose Evolving Computation Graphs (ECGs), a novel method for enhancing GNNs on heterophilic datasets. Our approach builds on prior theoretical insights linking node degree, high homophily, and inter vs intra-class embedding similarity by rewiring the GNNs' computation graph towards adding edges that connect nodes that are likely to be in the same class. We utilise weaker classifiers to identify these edges, ultimately improving GNN performance on non-homophilic data as a result. We evaluate ECGs on a diverse set of recently-proposed heterophilous datasets and demonstrate improvements over the relevant baselines. ECG presents a simple, intuitive and elegant approach for improving GNN performance on heterophilic datasets without requiring prior domain knowledge.
    Generating Synergistic Formulaic Alpha Collections via Reinforcement Learning. (arXiv:2306.12964v1 [q-fin.ST])
    In the field of quantitative trading, it is common practice to transform raw historical stock data into indicative signals for the market trend. Such signals are called alpha factors. Alphas in formula forms are more interpretable and thus favored by practitioners concerned with risk. In practice, a set of formulaic alphas is often used together for better modeling precision, so we need to find synergistic formulaic alpha sets that work well together. However, most traditional alpha generators mine alphas one by one separately, overlooking the fact that the alphas would be combined later. In this paper, we propose a new alpha-mining framework that prioritizes mining a synergistic set of alphas, i.e., it directly uses the performance of the downstream combination model to optimize the alpha generator. Our framework also leverages the strong exploratory capabilities of reinforcement learning~(RL) to better explore the vast search space of formulaic alphas. The contribution to the combination models' performance is assigned to be the return used in the RL process, driving the alpha generator to find better alphas that improve upon the current set. Experimental evaluations on real-world stock market data demonstrate both the effectiveness and the efficiency of our framework for stock trend forecasting. The investment simulation results show that our framework is able to achieve higher returns compared to previous approaches.
    Distributed Sparse Regression via Penalization. (arXiv:2111.06530v2 [cs.LG] UPDATED)
    We study sparse linear regression over a network of agents, modeled as an undirected graph (with no centralized node). The estimation problem is formulated as the minimization of the sum of the local LASSO loss functions plus a quadratic penalty of the consensus constraint -- the latter being instrumental to obtain distributed solution methods. While penalty-based consensus methods have been extensively studied in the optimization literature, their statistical and computational guarantees in the high dimensional setting remain unclear. This work provides an answer to this open problem. Our contribution is two-fold. First, we establish statistical consistency of the estimator: under a suitable choice of the penalty parameter, the optimal solution of the penalized problem achieves near optimal minimax rate $\mathcal{O}(s \log d/N)$ in $\ell_2$-loss, where $s$ is the sparsity value, $d$ is the ambient dimension, and $N$ is the total sample size in the network -- this matches centralized sample rates. Second, we show that the proximal-gradient algorithm applied to the penalized problem, which naturally leads to distributed implementations, converges linearly up to a tolerance of the order of the centralized statistical error -- the rate scales as $\mathcal{O}(d)$, revealing an unavoidable speed-accuracy dilemma.Numerical results demonstrate the tightness of the derived sample rate and convergence rate scalings.
    Complex Preferences for Different Convergent Priors in Discrete Graph Diffusion. (arXiv:2306.02957v2 [cs.LG] UPDATED)
    Diffusion models have achieved state-of-the-art performance in generating many different kinds of data, including images, text, and videos. Despite their success, there has been limited research on how the underlying diffusion process and the final convergent prior can affect generative performance; this research has also been limited to continuous data types and a score-based diffusion framework. To fill this gap, we explore how different discrete diffusion kernels (which converge to different prior distributions) affect the performance of diffusion models for graphs. To this end, we developed a novel formulation of a family of discrete diffusion kernels which are easily adjustable to converge to different Bernoulli priors, and we study the effect of these different kernels on generative performance. We show that the quality of generated graphs is sensitive to the prior used, and that the optimal choice cannot be explained by obvious statistics or metrics, which challenges the intuitions which previous works have suggested.
    PhAST: Physics-Aware, Scalable, and Task-specific GNNs for Accelerated Catalyst Design. (arXiv:2211.12020v3 [cs.LG] UPDATED)
    Mitigating the climate crisis requires a rapid transition towards lower-carbon energy. Catalyst materials play a crucial role in the electrochemical reactions involved in numerous industrial processes key to this transition, such as renewable energy storage and electrofuel synthesis. To reduce the energy spent on such activities, we must quickly discover more efficient catalysts to drive electrochemical reactions. Machine learning (ML) holds the potential to efficiently model materials properties from large amounts of data, accelerating electrocatalyst design. The Open Catalyst Project OC20 dataset was constructed to that end. However, ML models trained on OC20 are still neither scalable nor accurate enough for practical applications. In this paper, we propose task-specific innovations applicable to most architectures, enhancing both computational efficiency and accuracy. This includes improvements in (1) the graph creation step, (2) atom representations, (3) the energy prediction head, and (4) the force prediction head. We describe these contributions and evaluate them thoroughly on multiple architectures. Overall, our proposed PhAST improvements increase energy MAE by 4 to 42$\%$ while dividing compute time by 3 to 8$\times$ depending on the targeted task/model. PhAST also enables CPU training, leading to 40$\times$ speedups in highly parallelized settings. Python package: \url{https://phast.readthedocs.io}.
    A One-Sample Decentralized Proximal Algorithm for Non-Convex Stochastic Composite Optimization. (arXiv:2302.09766v2 [math.OC] UPDATED)
    We focus on decentralized stochastic non-convex optimization, where $n$ agents work together to optimize a composite objective function which is a sum of a smooth term and a non-smooth convex term. To solve this problem, we propose two single-time scale algorithms: Prox-DASA and Prox-DASA-GT. These algorithms can find $\epsilon$-stationary points in $\mathcal{O}(n^{-1}\epsilon^{-2})$ iterations using constant batch sizes (i.e., $\mathcal{O}(1)$). Unlike prior work, our algorithms achieve comparable complexity without requiring large batch sizes, more complex per-iteration operations (such as double loops), or stronger assumptions. Our theoretical findings are supported by extensive numerical experiments, which demonstrate the superiority of our algorithms over previous approaches. Our code is available at https://github.com/xuxingc/ProxDASA.
    Evaluating Online Bandit Exploration In Large-Scale Recommender System. (arXiv:2304.02572v2 [cs.IR] UPDATED)
    Bandit learning has been an increasingly popular design choice for recommender system. Despite the strong interest in bandit learning from the community, there remains multiple bottlenecks that prevent many bandit learning approaches from productionalization. One major bottleneck is how to test the effectiveness of bandit algorithm with fairness and without data leakage. Different from supervised learning algorithms, bandit learning algorithms emphasize greatly on the data collection process through their explorative nature. Such explorative behavior may induce unfair evaluation in a classic A/B test setting. In this work, we apply upper confidence bound (UCB) to our large scale short video recommender system and present a test framework for the production bandit learning life-cycle with a new set of metrics. Extensive experiment results show that our experiment design is able to fairly evaluate the performance of bandit learning in the recommender system.
    Interpretation of immunofluorescence slides by deep learning techniques: anti-nuclear antibodies case study. (arXiv:2306.12432v1 [q-bio.QM])
    Nowadays, diseases are increasing in numbers and severity by the hour. Immunity diseases, affecting 8\% of the world population in 2017 according to the World Health Organization (WHO), is a field in medicine worth attention due to the high rate of disease occurrence classified under this category. This work presents an up-to-date review of state-of-the-art immune diseases healthcare solutions. We focus on tackling the issue with modern solutions such as Deep Learning to detect anomalies in the early stages hence providing health practitioners with efficient tools. We rely on advanced deep learning techniques such as Convolutional Neural Networks (CNN) to fulfill our objective of providing an efficient tool while providing a proficient analysis of this solution. The proposed solution was tested and evaluated by the immunology department in the Principal Military Hospital of Instruction of Tunis, which considered it a very helpful tool.
    Quilt-1M: One Million Image-Text Pairs for Histopathology. (arXiv:2306.11207v2 [cs.CV] UPDATED)
    Recent accelerations in multi-modal applications have been made possible with the plethora of image and text data available online. However, the scarcity of analogous data in the medical field, specifically in histopathology, has halted comparable progress. To enable similar representation learning for histopathology, we turn to YouTube, an untapped resource of videos, offering $1,087$ hours of valuable educational histopathology videos from expert clinicians. From YouTube, we curate Quilt: a large-scale vision-language dataset consisting of $768,826$ image and text pairs. Quilt was automatically curated using a mixture of models, including large language models, handcrafted algorithms, human knowledge databases, and automatic speech recognition. In comparison, the most comprehensive datasets curated for histopathology amass only around $200$K samples. We combine Quilt with datasets from other sources, including Twitter, research papers, and the internet in general, to create an even larger dataset: Quilt-1M, with $1$M paired image-text samples, marking it as the largest vision-language histopathology dataset to date. We demonstrate the value of Quilt-1M by fine-tuning a pre-trained CLIP model. Our model outperforms state-of-the-art models on both zero-shot and linear probing tasks for classifying new histopathology images across $13$ diverse patch-level datasets of $8$ different sub-pathologies and cross-modal retrieval tasks.
    Achieving Sample and Computational Efficient Reinforcement Learning by Action Space Reduction via Grouping. (arXiv:2306.12981v1 [cs.LG])
    Reinforcement learning often needs to deal with the exponential growth of states and actions when exploring optimal control in high-dimensional spaces (often known as the curse of dimensionality). In this work, we address this issue by learning the inherent structure of action-wise similar MDP to appropriately balance the performance degradation versus sample/computational complexity. In particular, we partition the action spaces into multiple groups based on the similarity in transition distribution and reward function, and build a linear decomposition model to capture the difference between the intra-group transition kernel and the intra-group rewards. Both our theoretical analysis and experiments reveal a \emph{surprising and counter-intuitive result}: while a more refined grouping strategy can reduce the approximation error caused by treating actions in the same group as identical, it also leads to increased estimation error when the size of samples or the computation resources is limited. This finding highlights the grouping strategy as a new degree of freedom that can be optimized to minimize the overall performance loss. To address this issue, we formulate a general optimization problem for determining the optimal grouping strategy, which strikes a balance between performance loss and sample/computational complexity. We further propose a computationally efficient method for selecting a nearly-optimal grouping strategy, which maintains its computational complexity independent of the size of the action space.
    Classification Tree Pruning Under Covariate Shift. (arXiv:2305.04335v2 [stat.ML] UPDATED)
    We consider the problem of \emph{pruning} a classification tree, that is, selecting a suitable subtree that balances bias and variance, in common situations with inhomogeneous training data. Namely, assuming access to mostly data from a distribution $P_{X, Y}$, but little data from a desired distribution $Q_{X, Y}$ with different $X$-marginals, we present the first efficient procedure for optimal pruning in such situations, when cross-validation and other penalized variants are grossly inadequate. Optimality is derived with respect to a notion of \emph{average discrepancy} $P_{X} \to Q_{X}$ (averaged over $X$ space) which significantly relaxes a recent notion -- termed \emph{transfer-exponent} -- shown to tightly capture the limits of classification under such a distribution shift. Our relaxed notion can be viewed as a measure of \emph{relative dimension} between distributions, as it relates to existing notions of information such as the Minkowski and Renyi dimensions.
    Federated Few-shot Learning. (arXiv:2306.10234v2 [cs.LG] UPDATED)
    Federated Learning (FL) enables multiple clients to collaboratively learn a machine learning model without exchanging their own local data. In this way, the server can exploit the computational power of all clients and train the model on a larger set of data samples among all clients. Although such a mechanism is proven to be effective in various fields, existing works generally assume that each client preserves sufficient data for training. In practice, however, certain clients may only contain a limited number of samples (i.e., few-shot samples). For example, the available photo data taken by a specific user with a new mobile device is relatively rare. In this scenario, existing FL efforts typically encounter a significant performance drop on these clients. Therefore, it is urgent to develop a few-shot model that can generalize to clients with limited data under the FL scenario. In this paper, we refer to this novel problem as federated few-shot learning. Nevertheless, the problem remains challenging due to two major reasons: the global data variance among clients (i.e., the difference in data distributions among clients) and the local data insufficiency in each client (i.e., the lack of adequate local data for training). To overcome these two challenges, we propose a novel federated few-shot learning framework with two separately updated models and dedicated training strategies to reduce the adverse impact of global data variance and local data insufficiency. Extensive experiments on four prevalent datasets that cover news articles and images validate the effectiveness of our framework compared with the state-of-the-art baselines. Our code is provided at https://github.com/SongW-SW/F2L.
    Sharp analysis of EM for learning mixtures of pairwise differences. (arXiv:2302.10066v2 [math.ST] UPDATED)
    We consider a symmetric mixture of linear regressions with random samples from the pairwise comparison design, which can be seen as a noisy version of a type of Euclidean distance geometry problem. We analyze the expectation-maximization (EM) algorithm locally around the ground truth and establish that the sequence converges linearly, providing an $\ell_\infty$-norm guarantee on the estimation error of the iterates. Furthermore, we show that the limit of the EM sequence achieves the sharp rate of estimation in the $\ell_2$-norm, matching the information-theoretically optimal constant. We also argue through simulation that convergence from a random initialization is much more delicate in this setting, and does not appear to occur in general. Our results show that the EM algorithm can exhibit several unique behaviors when the covariate distribution is suitably structured.
    VL-CheckList: Evaluating Pre-trained Vision-Language Models with Objects, Attributes and Relations. (arXiv:2207.00221v2 [cs.CV] UPDATED)
    Vision-Language Pretraining (VLP) models have recently successfully facilitated many cross-modal downstream tasks. Most existing works evaluated their systems by comparing the fine-tuned downstream task performance. However, only average downstream task accuracy provides little information about the pros and cons of each VLP method, let alone provides insights on how the community can improve the systems in the future. Inspired by the CheckList for testing natural language processing, we exploit VL-CheckList, a novel framework to understand the capabilities of VLP models. The proposed method divides the image-texting ability of a VLP model into three categories: objects, attributes, and relations, and uses a novel taxonomy to further break down these three aspects. We conduct comprehensive studies to analyze seven recently popular VLP models via the proposed framework. Results confirm the effectiveness of the proposed method by revealing fine-grained differences among the compared models that were not visible from downstream task-only evaluation. Further results show promising research direction in building better VLP models. Our data and code are available at: https://github.com/om-ai-lab/VL-CheckList.
    Sample Complexity for Quadratic Bandits: Hessian Dependent Bounds and Optimal Algorithms. (arXiv:2306.12383v2 [cs.LG] UPDATED)
    In stochastic zeroth-order optimization, a problem of practical relevance is understanding how to fully exploit the local geometry of the underlying objective function. We consider a fundamental setting in which the objective function is quadratic, and provide the first tight characterization of the optimal Hessian-dependent sample complexity. Our contribution is twofold. First, from an information-theoretic point of view, we prove tight lower bounds on Hessian-dependent complexities by introducing a concept called energy allocation, which captures the interaction between the searching algorithm and the geometry of objective functions. A matching upper bound is obtained by solving the optimal energy spectrum. Then, algorithmically, we show the existence of a Hessian-independent algorithm that universally achieves the asymptotic optimal sample complexities for all Hessian instances. The optimal sample complexities achieved by our algorithm remain valid for heavy-tailed noise distributions, which are enabled by a truncation method.
    Evading Forensic Classifiers with Attribute-Conditioned Adversarial Faces. (arXiv:2306.13091v1 [cs.CV])
    The ability of generative models to produce highly realistic synthetic face images has raised security and ethical concerns. As a first line of defense against such fake faces, deep learning based forensic classifiers have been developed. While these forensic models can detect whether a face image is synthetic or real with high accuracy, they are also vulnerable to adversarial attacks. Although such attacks can be highly successful in evading detection by forensic classifiers, they introduce visible noise patterns that are detectable through careful human scrutiny. Additionally, these attacks assume access to the target model(s) which may not always be true. Attempts have been made to directly perturb the latent space of GANs to produce adversarial fake faces that can circumvent forensic classifiers. In this work, we go one step further and show that it is possible to successfully generate adversarial fake faces with a specified set of attributes (e.g., hair color, eye size, race, gender, etc.). To achieve this goal, we leverage the state-of-the-art generative model StyleGAN with disentangled representations, which enables a range of modifications without leaving the manifold of natural images. We propose a framework to search for adversarial latent codes within the feature space of StyleGAN, where the search can be guided either by a text prompt or a reference image. We also propose a meta-learning based optimization strategy to achieve transferable performance on unknown target models. Extensive experiments demonstrate that the proposed approach can produce semantically manipulated adversarial fake faces, which are true to the specified attribute set and can successfully fool forensic face classifiers, while remaining undetectable by humans. Code: https://github.com/koushiksrivats/face_attribute_attack.
    Blended-NeRF: Zero-Shot Object Generation and Blending in Existing Neural Radiance Fields. (arXiv:2306.12760v1 [cs.CV])
    Editing a local region or a specific object in a 3D scene represented by a NeRF is challenging, mainly due to the implicit nature of the scene representation. Consistently blending a new realistic object into the scene adds an additional level of difficulty. We present Blended-NeRF, a robust and flexible framework for editing a specific region of interest in an existing NeRF scene, based on text prompts or image patches, along with a 3D ROI box. Our method leverages a pretrained language-image model to steer the synthesis towards a user-provided text prompt or image patch, along with a 3D MLP model initialized on an existing NeRF scene to generate the object and blend it into a specified region in the original scene. We allow local editing by localizing a 3D ROI box in the input scene, and seamlessly blend the content synthesized inside the ROI with the existing scene using a novel volumetric blending technique. To obtain natural looking and view-consistent results, we leverage existing and new geometric priors and 3D augmentations for improving the visual fidelity of the final result. We test our framework both qualitatively and quantitatively on a variety of real 3D scenes and text prompts, demonstrating realistic multi-view consistent results with much flexibility and diversity compared to the baselines. Finally, we show the applicability of our framework for several 3D editing applications, including adding new objects to a scene, removing/replacing/altering existing objects, and texture conversion.
    Controlling Privacy Loss in Sampling Schemes: an Analysis of Stratified and Cluster Sampling. (arXiv:2007.12674v2 [stat.ME] UPDATED)
    Sampling schemes are fundamental tools in statistics, survey design, and algorithm design. A fundamental result in differential privacy is that a differentially private mechanism run on a simple random sample of a population provides stronger privacy guarantees than the same algorithm run on the entire population. However, in practice, sampling designs are often more complex than the simple, data-independent sampling schemes that are addressed in prior work. In this work, we extend the study of privacy amplification results to more complex, data-dependent sampling schemes. We find that not only do these sampling schemes often fail to amplify privacy, they can actually result in privacy degradation. We analyze the privacy implications of the pervasive cluster sampling and stratified sampling paradigms, as well as provide some insight into the study of more general sampling designs.
    On the explainable properties of 1-Lipschitz Neural Networks: An Optimal Transport Perspective. (arXiv:2206.06854v2 [cs.AI] UPDATED)
    Input gradients have a pivotal role in a variety of applications, including adversarial attack algorithms for evaluating model robustness, explainable AI techniques for generating Saliency Maps, and counterfactual explanations. However, Saliency Maps generated by traditional neural networks are often noisy and provide limited insights. In this paper, we demonstrate that, on the contrary, the Saliency Maps of 1-Lipschitz neural networks, learnt with the dual loss of an optimal transportation problem, exhibit desirable XAI properties: They are highly concentrated on the essential parts of the image with low noise, significantly outperforming state-of-the-art explanation approaches across various models and metrics. We also prove that these maps align unprecedentedly well with human explanations on ImageNet. To explain the particularly beneficial properties of the Saliency Map for such models, we prove this gradient encodes both the direction of the transportation plan and the direction towards the nearest adversarial attack. Following the gradient down to the decision boundary is no longer considered an adversarial attack, but rather a counterfactual explanation that explicitly transports the input from one class to another. Thus, Learning with such a loss jointly optimizes the classification objective and the alignment of the gradient , i.e. the Saliency Map, to the transportation plan direction. These networks were previously known to be certifiably robust by design, and we demonstrate that they scale well for large problems and models, and are tailored for explainability using a fast and straightforward method.
    Squeeze, Recover and Relabel: Dataset Condensation at ImageNet Scale From A New Perspective. (arXiv:2306.13092v1 [cs.CV])
    We present a new dataset condensation framework termed Squeeze, Recover and Relabel (SRe$^2$L) that decouples the bilevel optimization of model and synthetic data during training, to handle varying scales of datasets, model architectures and image resolutions for effective dataset condensation. The proposed method demonstrates flexibility across diverse dataset scales and exhibits multiple advantages in terms of arbitrary resolutions of synthesized images, low training cost and memory consumption with high-resolution training, and the ability to scale up to arbitrary evaluation network architectures. Extensive experiments are conducted on Tiny-ImageNet and full ImageNet-1K datasets. Under 50 IPC, our approach achieves the highest 42.5% and 60.8% validation accuracy on Tiny-ImageNet and ImageNet-1K, outperforming all previous state-of-the-art methods by margins of 14.5% and 32.9%, respectively. Our approach also outperforms MTT by approximately 52$\times$ (ConvNet-4) and 16$\times$ (ResNet-18) faster in speed with less memory consumption of 11.6$\times$ and 6.4$\times$ during data synthesis. Our code and condensed datasets of 50, 200 IPC with 4K recovery budget are available at https://zeyuanyin.github.io/projects/SRe2L/.
    Decentralized Multi-Agent Reinforcement Learning with Global State Prediction. (arXiv:2306.12926v1 [cs.RO])
    Deep reinforcement learning (DRL) has seen remarkable success in the control of single robots. However, applying DRL to robot swarms presents significant challenges. A critical challenge is non-stationarity, which occurs when two or more robots update individual or shared policies concurrently, thereby engaging in an interdependent training process with no guarantees of convergence. Circumventing non-stationarity typically involves training the robots with global information about other agents' states and/or actions. In contrast, in this paper we explore how to remove the need for global information. We pose our problem as a Partially Observable Markov Decision Process, due to the absence of global knowledge on other agents. Using collective transport as a testbed scenario, we study two approaches to multi-agent training. In the first, the robots exchange no messages, and are trained to rely on implicit communication through push-and-pull on the object to transport. In the second approach, we introduce Global State Prediction (GSP), a network trained to forma a belief over the swarm as a whole and predict its future states. We provide a comprehensive study over four well-known deep reinforcement learning algorithms in environments with obstacles, measuring performance as the successful transport of the object to the goal within a desired time-frame. Through an ablation study, we show that including GSP boosts performance and increases robustness when compared with methods that use global knowledge.
    Triggering Dark Showers with Conditional Dual Auto-Encoders. (arXiv:2306.12955v1 [hep-ex])
    Auto-encoders (AEs) have the potential to be effective and generic tools for new physics searches at colliders, requiring little to no model-dependent assumptions. New hypothetical physics signals can be considered anomalies that deviate from the well-known background processes generally expected to describe the whole dataset. We present a search formulated as an anomaly detection (AD) problem, using an AE to define a criterion to decide about the physics nature of an event. In this work, we perform an AD search for manifestations of a dark version of strong force using raw detector images, which are large and very sparse, without leveraging any physics-based pre-processing or assumption on the signals. We propose a dual-encoder design which can learn a compact latent space through conditioning. In the context of multiple AD metrics, we present a clear improvement over competitive baselines and prior approaches. It is the first time that an AE is shown to exhibit excellent discrimination against multiple dark shower models, illustrating the suitability of this method as a performant, model-independent algorithm to deploy, e.g., in the trigger stage of LHC experiments such as ATLAS and CMS.
    A Semi-supervised Sensing Rate Learning based CMAB Scheme to Combat COVID-19 by Trustful Data Collection in the Crowd. (arXiv:2301.08563v2 [cs.HC] UPDATED)
    The recruitment of trustworthy and high-quality workers is an important research issue for MCS. Previous studies either assume that the qualities of workers are known in advance, or assume that the platform knows the qualities of workers once it receives their collected data. In reality, to reduce costs and thus maximize revenue, many strategic workers do not perform their sensing tasks honestly and report fake data to the platform, which is called False data attacks. And it is very hard for the platform to evaluate the authenticity of the received data. In this paper, an incentive mechanism named Semi-supervision based Combinatorial Multi-Armed Bandit reverse Auction (SCMABA) is proposed to solve the recruitment problem of multiple unknown and strategic workers in MCS. First, we model the worker recruitment as a multi-armed bandit reverse auction problem and design an UCB-based algorithm to separate the exploration and exploitation, regarding the Sensing Rates (SRs) of recruited workers as the gain of the bandit. Next, a Semi-supervised Sensing Rate Learning (SSRL) approach is proposed to quickly and accurately obtain the workers' SRs, which consists of two phases, supervision and self-supervision. Last, SCMABA is designed organically combining the SRs acquisition mechanism with multi-armed bandit reverse auction, where supervised SR learning is used in the exploration, and the self-supervised one is used in the exploitation. We theoretically prove that our SCMABA achieves truthfulness and individual rationality and exhibits outstanding performances of the SCMABA mechanism through in-depth simulations of real-world data traces.
    Auditing Predictive Models for Intersectional Biases. (arXiv:2306.13064v1 [cs.LG])
    Predictive models that satisfy group fairness criteria in aggregate for members of a protected class, but do not guarantee subgroup fairness, could produce biased predictions for individuals at the intersection of two or more protected classes. To address this risk, we propose Conditional Bias Scan (CBS), a flexible auditing framework for detecting intersectional biases in classification models. CBS identifies the subgroup for which there is the most significant bias against the protected class, as compared to the equivalent subgroup in the non-protected class, and can incorporate multiple commonly used fairness definitions for both probabilistic and binarized predictions. We show that this methodology can detect previously unidentified intersectional and contextual biases in the COMPAS pre-trial risk assessment tool and has higher bias detection power compared to similar methods that audit for subgroup fairness.
    Machine-Learning-Assisted and Real-Time-Feedback-Controlled Growth of InAs/GaAs Quantum Dots. (arXiv:2306.12898v1 [cond-mat.mes-hall])
    Self-assembled InAs/GaAs quantum dots (QDs) have properties highly valuable for developing various optoelectronic devices such as QD lasers and single photon sources. The applications strongly rely on the density and quality of these dots, which has motivated studies of the growth process control to realize high-quality epi-wafers and devices. Establishing the process parameters in molecular beam epitaxy (MBE) for a specific density of QDs is a multidimensional optimization challenge, usually addressed through time-consuming and iterative trial-and-error. Meanwhile, reflective high-energy electron diffraction (RHEED) has been widely used to capture a wealth of growth information in situ. However, it still faces the challenges of extracting information from noisy and overlapping images. Here, based on 3D ResNet, we developed a machine learning (ML) model specially designed for training RHEED videos instead of static images and providing real-time feedback on surface morphologies for process control. We demonstrated that ML from previous growth could predict the post-growth density of QDs, by successfully tuning the QD densities in near-real time from 1.5E10 cm-2 down to 3.8E8 cm-2 or up to 1.4 E11 cm-2. Compared to traditional methods, our approach, with in-situ tuning capabilities and excellent reliability, can dramatically expedite the material optimization process and improve the reproducibility of MBE growth, constituting significant progress for thin film growth techniques. The concepts and methodologies proved feasible in this work are promising to be applied to a variety of material growth processes, which will revolutionize semiconductor manufacturing for microelectronic and optoelectronic industries.
    Approximation Algorithms for Fair Range Clustering. (arXiv:2306.06778v2 [cs.LG] UPDATED)
    This paper studies the fair range clustering problem in which the data points are from different demographic groups and the goal is to pick $k$ centers with the minimum clustering cost such that each group is at least minimally represented in the centers set and no group dominates the centers set. More precisely, given a set of $n$ points in a metric space $(P,d)$ where each point belongs to one of the $\ell$ different demographics (i.e., $P = P_1 \uplus P_2 \uplus \cdots \uplus P_\ell$) and a set of $\ell$ intervals $[\alpha_1, \beta_1], \cdots, [\alpha_\ell, \beta_\ell]$ on desired number of centers from each group, the goal is to pick a set of $k$ centers $C$ with minimum $\ell_p$-clustering cost (i.e., $(\sum_{v\in P} d(v,C)^p)^{1/p}$) such that for each group $i\in \ell$, $|C\cap P_i| \in [\alpha_i, \beta_i]$. In particular, the fair range $\ell_p$-clustering captures fair range $k$-center, $k$-median and $k$-means as its special cases. In this work, we provide efficient constant factor approximation algorithms for fair range $\ell_p$-clustering for all values of $p\in [1,\infty)$.
    Inferring the finest pattern of mutual independence from data. (arXiv:2306.12984v1 [stat.ML])
    For a random variable $X$, we are interested in the blind extraction of its finest mutual independence pattern $\mu ( X )$. We introduce a specific kind of independence that we call dichotomic. If $\Delta ( X )$ stands for the set of all patterns of dichotomic independence that hold for $X$, we show that $\mu ( X )$ can be obtained as the intersection of all elements of $\Delta ( X )$. We then propose a method to estimate $\Delta ( X )$ when the data are independent and identically (i.i.d.) realizations of a multivariate normal distribution. If $\hat{\Delta} ( X )$ is the estimated set of valid patterns of dichotomic independence, we estimate $\mu ( X )$ as the intersection of all patterns of $\hat{\Delta} ( X )$. The method is tested on simulated data, showing its advantages and limits. We also consider an application to a toy example as well as to experimental data.
    Learning from Visual Observation via Offline Pretrained State-to-Go Transformer. (arXiv:2306.12860v1 [cs.LG])
    Learning from visual observation (LfVO), aiming at recovering policies from only visual observation data, is promising yet a challenging problem. Existing LfVO approaches either only adopt inefficient online learning schemes or require additional task-specific information like goal states, making them not suited for open-ended tasks. To address these issues, we propose a two-stage framework for learning from visual observation. In the first stage, we introduce and pretrain State-to-Go (STG) Transformer offline to predict and differentiate latent transitions of demonstrations. Subsequently, in the second stage, the STG Transformer provides intrinsic rewards for downstream reinforcement learning tasks where an agent learns merely from intrinsic rewards. Empirical results on Atari and Minecraft show that our proposed method outperforms baselines and in some tasks even achieves performance comparable to the policy learned from environmental rewards. These results shed light on the potential of utilizing video-only data to solve difficult visual reinforcement learning tasks rather than relying on complete offline datasets containing states, actions, and rewards. The project's website and code can be found at https://sites.google.com/view/stgtransformer.
    Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing. (arXiv:2306.12929v1 [cs.LG])
    Transformer models have been widely adopted in various domains over the last years, and especially large language models have advanced the field of AI significantly. Due to their size, the capability of these networks has increased tremendously, but this has come at the cost of a significant increase in necessary compute. Quantization is one of the most effective ways to reduce the computational time and memory consumption of neural networks. Many studies have shown, however, that modern transformer models tend to learn strong outliers in their activations, making them difficult to quantize. To retain acceptable performance, the existence of these outliers requires activations to be in higher bitwidth or the use of different numeric formats, extra fine-tuning, or other workarounds. We show that strong outliers are related to very specific behavior of attention heads that try to learn a "no-op" or just a partial update of the residual. To achieve the exact zeros needed in the attention matrix for a no-update, the input to the softmax is pushed to be larger and larger during training, causing outliers in other parts of the network. Based on these observations, we propose two simple (independent) modifications to the attention mechanism - clipped softmax and gated attention. We empirically show that models pre-trained using our methods learn significantly smaller outliers while maintaining and sometimes even improving the floating-point task performance. This enables us to quantize transformers to full INT8 quantization of the activations without any additional effort. We demonstrate the effectiveness of our methods on both language models (BERT, OPT) and vision transformers.
    Beyond Deep Ensembles: A Large-Scale Evaluation of Bayesian Deep Learning under Distribution Shift. (arXiv:2306.12306v2 [cs.LG] UPDATED)
    Bayesian deep learning (BDL) is a promising approach to achieve well-calibrated predictions on distribution-shifted data. Nevertheless, there exists no large-scale survey that evaluates recent SOTA methods on diverse, realistic, and challenging benchmark tasks in a systematic manner. To provide a clear picture of the current state of BDL research, we evaluate modern BDL algorithms on real-world datasets from the WILDS collection containing challenging classification and regression tasks, with a focus on generalization capability and calibration under distribution shift. We compare the algorithms on a wide range of large, convolutional and transformer-based neural network architectures. In particular, we investigate a signed version of the expected calibration error that reveals whether the methods are over- or under-confident, providing further insight into the behavior of the methods. Further, we provide the first systematic evaluation of BDL for fine-tuning large pre-trained models, where training from scratch is prohibitively expensive. Finally, given the recent success of Deep Ensembles, we extend popular single-mode posterior approximations to multiple modes by the use of ensembles. While we find that ensembling single-mode approximations generally improves the generalization capability and calibration of the models by a significant margin, we also identify a failure mode of ensembles when finetuning large transformer-based language models. In this setting, variational inference based approaches such as last-layer Bayes By Backprop outperform other methods in terms of accuracy by a large margin, while modern approximate inference algorithms such as SWAG achieve the best calibration.
    A Survey of Deep Learning for Mathematical Reasoning. (arXiv:2212.10535v2 [cs.AI] UPDATED)
    Mathematical reasoning is a fundamental aspect of human intelligence and is applicable in various fields, including science, engineering, finance, and everyday life. The development of artificial intelligence (AI) systems capable of solving math problems and proving theorems has garnered significant interest in the fields of machine learning and natural language processing. For example, mathematics serves as a testbed for aspects of reasoning that are challenging for powerful deep learning models, driving new algorithmic and modeling advances. On the other hand, recent advances in large-scale neural language models have opened up new benchmarks and opportunities to use deep learning for mathematical reasoning. In this survey paper, we review the key tasks, datasets, and methods at the intersection of mathematical reasoning and deep learning over the past decade. We also evaluate existing benchmarks and methods, and discuss future research directions in this domain.
    Wind Noise Reduction with a Diffusion-based Stochastic Regeneration Model. (arXiv:2306.12867v1 [eess.AS])
    In this paper we present a method for single-channel wind noise reduction using our previously proposed diffusion-based stochastic regeneration model combining predictive and generative modelling. We introduce a non-additive speech in noise model to account for the non-linear deformation of the membrane caused by the wind flow and possible clipping. We show that our stochastic regeneration model outperforms other neural-network-based wind noise reduction methods as well as purely predictive and generative models, on a dataset using simulated and real-recorded wind noise. We further show that the proposed method generalizes well by testing on an unseen dataset with real-recorded wind noise. Audio samples, data generation scripts and code for the proposed methods can be found online (https://uhh.de/inf-sp-storm-wind).
    SNAP: Efficient Extraction of Private Properties with Poisoning. (arXiv:2208.12348v2 [cs.LG] UPDATED)
    Property inference attacks allow an adversary to extract global properties of the training dataset from a machine learning model. Such attacks have privacy implications for data owners sharing their datasets to train machine learning models. Several existing approaches for property inference attacks against deep neural networks have been proposed, but they all rely on the attacker training a large number of shadow models, which induces a large computational overhead. In this paper, we consider the setting of property inference attacks in which the attacker can poison a subset of the training dataset and query the trained target model. Motivated by our theoretical analysis of model confidences under poisoning, we design an efficient property inference attack, SNAP, which obtains higher attack success and requires lower amounts of poisoning than the state-of-the-art poisoning-based property inference attack by Mahloujifar et al. For example, on the Census dataset, SNAP achieves 34% higher success rate than Mahloujifar et al. while being 56.5x faster. We also extend our attack to infer whether a certain property was present at all during training and estimate the exact proportion of a property of interest efficiently. We evaluate our attack on several properties of varying proportions from four datasets and demonstrate SNAP's generality and effectiveness. An open-source implementation of SNAP can be found at https://github.com/johnmath/snap-sp23.
    Improved Financial Forecasting via Quantum Machine Learning. (arXiv:2306.12965v1 [q-fin.ST])
    Quantum algorithms have the potential to enhance machine learning across a variety of domains and applications. In this work, we show how quantum machine learning can be used to improve financial forecasting. First, we use classical and quantum Determinantal Point Processes to enhance Random Forest models for churn prediction, improving precision by almost 6%. Second, we design quantum neural network architectures with orthogonal and compound layers for credit risk assessment, which match classical performance with significantly fewer parameters. Our results demonstrate that leveraging quantum ideas can effectively enhance the performance of machine learning, both today as quantum-inspired classical ML solutions, and even more in the future, with the advent of better quantum hardware.
    Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory Weighting. (arXiv:2306.13085v1 [cs.LG])
    Most offline reinforcement learning (RL) algorithms return a target policy maximizing a trade-off between (1) the expected performance gain over the behavior policy that collected the dataset, and (2) the risk stemming from the out-of-distribution-ness of the induced state-action occupancy. It follows that the performance of the target policy is strongly related to the performance of the behavior policy and, thus, the trajectory return distribution of the dataset. We show that in mixed datasets consisting of mostly low-return trajectories and minor high-return trajectories, state-of-the-art offline RL algorithms are overly restrained by low-return trajectories and fail to exploit high-performing trajectories to the fullest. To overcome this issue, we show that, in deterministic MDPs with stochastic initial states, the dataset sampling can be re-weighted to induce an artificial dataset whose behavior policy has a higher return. This re-weighted sampling strategy may be combined with any offline RL algorithm. We further analyze that the opportunity for performance improvement over the behavior policy correlates with the positive-sided variance of the returns of the trajectories in the dataset. We empirically show that while CQL, IQL, and TD3+BC achieve only a part of this potential policy improvement, these same algorithms combined with our reweighted sampling strategy fully exploit the dataset. Furthermore, we empirically demonstrate that, despite its theoretical limitation, the approach may still be efficient in stochastic environments. The code is available at https://github.com/Improbable-AI/harness-offline-rl.
    Can a single image processing algorithm work equally well across all phases of DCE-MRI?. (arXiv:2306.12988v1 [eess.IV])
    Image segmentation and registration are said to be challenging when applied to dynamic contrast enhanced MRI sequences (DCE-MRI). The contrast agent causes rapid changes in intensity in the region of interest and elsewhere, which can lead to false positive predictions for segmentation tasks and confound the image registration similarity metric. While it is widely assumed that contrast changes increase the difficulty of these tasks, to our knowledge no work has quantified these effects. In this paper we examine the effect of training with different ratios of contrast enhanced (CE) data on two popular tasks: segmentation with nnU-Net and Mask R-CNN and registration using VoxelMorph and VTN. We experimented further by strategically using the available datasets through pretraining and fine tuning with different splits of data. We found that to create a generalisable model, pretraining with CE data and fine tuning with non-CE data gave the best result. This interesting find could be expanded to other deep learning based image processing tasks with DCE-MRI and provide significant improvements to the models performance.
    Enhancing variational quantum state diagonalization using reinforcement learning techniques. (arXiv:2306.11086v2 [quant-ph] UPDATED)
    The development of variational quantum algorithms is crucial for the application of NISQ computers. Such algorithms require short quantum circuits, which are more amenable to implementation on near-term hardware, and many such methods have been developed. One of particular interest is the so-called the variational diagonalization method, which constitutes an important algorithmic subroutine, and it can be used directly for working with data encoded in quantum states. In particular, it can be applied to discern the features of quantum states, such as entanglement properties of a system, or in quantum machine learning algorithms. In this work, we tackle the problem of designing a very shallow quantum circuit, required in the quantum state diagonalization task, by utilizing reinforcement learning. To achieve this, we utilize a novel encoding method that can be used to tackle the problem of circuit depth optimization using a reinforcement learning approach. We demonstrate that our approach provides a solid approximation to the diagonalization task while using a small number of gates. The circuits proposed by the reinforcement learning methods are shallower than the standard variational quantum state diagonalization algorithm, and thus can be used in situations where the depth of quantum circuits is limited by the hardware capabilities.
    AdCraft: An Advanced Reinforcement Learning Benchmark Environment for Search Engine Marketing Optimization. (arXiv:2306.11971v2 [cs.LG] UPDATED)
    We introduce AdCraft, a novel benchmark environment for the Reinforcement Learning (RL) community distinguished by its stochastic and non-stationary properties. The environment simulates bidding and budgeting dynamics within Search Engine Marketing (SEM), a digital marketing technique utilizing paid advertising to enhance the visibility of websites on search engine results pages (SERPs). The performance of SEM advertisement campaigns depends on several factors, including keyword selection, ad design, bid management, budget adjustments, and performance monitoring. Deep RL recently emerged as a potential strategy to optimize campaign profitability within the complex and dynamic landscape of SEM but it requires substantial data, which may be costly or infeasible to acquire in practice. Our customizable environment enables practitioners to assess and enhance the robustness of RL algorithms pertinent to SEM bid and budget management without such costs. Through a series of experiments within the environment, we demonstrate the challenges imposed by sparsity and non-stationarity on agent convergence and performance. We hope these challenges further encourage discourse and development around effective strategies for managing real-world uncertainties.
    Exploring the Training Robustness of Distributional Reinforcement Learning against Noisy State Observations. (arXiv:2109.08776v5 [cs.LG] UPDATED)
    In real scenarios, state observations that an agent observes may contain measurement errors or adversarial noises, misleading the agent to take suboptimal actions or even collapse while training. In this paper, we study the training robustness of distributional Reinforcement Learning (RL), a class of state-of-the-art methods that estimate the whole distribution, as opposed to only the expectation, of the total return. Firstly, we validate the contraction of distributional Bellman operators in the State-Noisy Markov Decision Process (SN-MDP), a typical tabular case that incorporates both random and adversarial state observation noises. In the noisy setting with function approximation, we then analyze the vulnerability of least squared loss in expectation-based RL with either linear or nonlinear function approximation. By contrast, we theoretically characterize the bounded gradient norm of distributional RL loss based on the categorical parameterization equipped with the KL divergence. The resulting stable gradients while the optimization in distributional RL accounts for its better training robustness against state observation noises. Finally, extensive experiments on the suite of environments verified that distributional RL is less vulnerable against both random and adversarial noisy state observations compared with its expectation-based counterpart.
    Rapid building damage assessment workflow: An implementation for the 2023 Rolling Fork, Mississippi tornado event. (arXiv:2306.12589v1 [cs.CV])
    Rapid and accurate building damage assessments from high-resolution satellite imagery following a natural disaster is essential to inform and optimize first responder efforts. However, performing such building damage assessments in an automated manner is non-trivial due to the challenges posed by variations in disaster-specific damage, diversity in satellite imagery, and the dearth of extensive, labeled datasets. To circumvent these issues, this paper introduces a human-in-the-loop workflow for rapidly training building damage assessment models after a natural disaster. This article details a case study using this workflow, executed in partnership with the American Red Cross during a tornado event in Rolling Fork, Mississippi in March, 2023. The output from our human-in-the-loop modeling process achieved a precision of 0.86 and recall of 0.80 for damaged buildings when compared to ground truth data collected post-disaster. This workflow was implemented end-to-end in under 2 hours per satellite imagery scene, highlighting its potential for real-time deployment.
    A Comparison of Time-based Models for Multimodal Emotion Recognition. (arXiv:2306.13076v1 [cs.LG])
    Emotion recognition has become an important research topic in the field of human-computer interaction. Studies on sound and videos to understand emotions focused mainly on analyzing facial expressions and classified 6 basic emotions. In this study, the performance of different sequence models in multi-modal emotion recognition was compared. The sound and images were first processed by multi-layered CNN models, and the outputs of these models were fed into various sequence models. The sequence model is GRU, Transformer, LSTM and Max Pooling. Accuracy, precision, and F1 Score values of all models were calculated. The multi-modal CREMA-D dataset was used in the experiments. As a result of the comparison of the CREMA-D dataset, GRU-based architecture with 0.640 showed the best result in F1 score, LSTM-based architecture with 0.699 in precision metric, while sensitivity showed the best results over time with Max Pooling-based architecture with 0.620. As a result, it has been observed that the sequence models compare performances close to each other.
    IGB: Addressing The Gaps In Labeling, Features, Heterogeneity, and Size of Public Graph Datasets for Deep Learning Research. (arXiv:2302.13522v2 [cs.LG] UPDATED)
    Graph neural networks (GNNs) have shown high potential for a variety of real-world, challenging applications, but one of the major obstacles in GNN research is the lack of large-scale flexible datasets. Most existing public datasets for GNNs are relatively small, which limits the ability of GNNs to generalize to unseen data. The few existing large-scale graph datasets provide very limited labeled data. This makes it difficult to determine if the GNN model's low accuracy for unseen data is inherently due to insufficient training data or if the model failed to generalize. Additionally, datasets used to train GNNs need to offer flexibility to enable a thorough study of the impact of various factors while training GNN models. In this work, we introduce the Illinois Graph Benchmark (IGB), a research dataset tool that the developers can use to train, scrutinize and systematically evaluate GNN models with high fidelity. IGB includes both homogeneous and heterogeneous academic graphs of enormous sizes, with more than 40% of their nodes labeled. Compared to the largest graph datasets publicly available, the IGB provides over 162X more labeled data for deep learning practitioners and developers to create and evaluate models with higher accuracy. The IGB dataset is a collection of academic graphs designed to be flexible, enabling the study of various GNN architectures, embedding generation techniques, and analyzing system performance issues for node classification tasks. IGB is open-sourced, supports DGL and PyG frameworks, and comes with releases of the raw text that we believe foster emerging language models and GNN research projects. An early public version of IGB is available at https://github.com/IllinoisGraphBenchmark/IGB-Datasets.
    Efficient Partitioning Method of Large-Scale Public Safety Spatio-Temporal Data based on Information Loss Constraints. (arXiv:2306.12857v1 [cs.LG])
    The storage, management, and application of massive spatio-temporal data are widely applied in various practical scenarios, including public safety. However, due to the unique spatio-temporal distribution characteristics of re-al-world data, most existing methods have limitations in terms of the spatio-temporal proximity of data and load balancing in distributed storage. There-fore, this paper proposes an efficient partitioning method of large-scale public safety spatio-temporal data based on information loss constraints (IFL-LSTP). The IFL-LSTP model specifically targets large-scale spatio-temporal point da-ta by combining the spatio-temporal partitioning module (STPM) with the graph partitioning module (GPM). This approach can significantly reduce the scale of data while maintaining the model's accuracy, in order to improve the partitioning efficiency. It can also ensure the load balancing of distributed storage while maintaining spatio-temporal proximity of the data partitioning results. This method provides a new solution for distributed storage of mas-sive spatio-temporal data. The experimental results on multiple real-world da-tasets demonstrate the effectiveness and superiority of IFL-LSTP.
    GIMLET: A Unified Graph-Text Model for Instruction-Based Molecule Zero-Shot Learning. (arXiv:2306.13089v1 [cs.LG])
    Molecule property prediction has gained significant attention in recent years. The main bottleneck is the label insufficiency caused by expensive lab experiments. In order to alleviate this issue and to better leverage textual knowledge for tasks, this study investigates the feasibility of employing natural language instructions to accomplish molecule-related tasks in a zero-shot setting. We discover that existing molecule-text models perform poorly in this setting due to inadequate treatment of instructions and limited capacity for graphs. To overcome these issues, we propose GIMLET, which unifies language models for both graph and text data. By adopting generalized position embedding, our model is extended to encode both graph structures and instruction text without additional graph encoding modules. GIMLET also decouples encoding of the graph from tasks instructions in the attention mechanism, enhancing the generalization of graph features across novel tasks. We construct a dataset consisting of more than two thousand molecule tasks with corresponding instructions derived from task descriptions. We pretrain GIMLET on the molecule tasks along with instructions, enabling the model to transfer effectively to a broad range of tasks. Experimental results demonstrate that GIMLET significantly outperforms molecule-text baselines in instruction-based zero-shot learning, even achieving closed results to supervised GNN models on tasks such as toxcast and muv.
    Mitigating Discrimination in Insurance with Wasserstein Barycenters. (arXiv:2306.12912v1 [stat.ML])
    The insurance industry is heavily reliant on predictions of risks based on characteristics of potential customers. Although the use of said models is common, researchers have long pointed out that such practices perpetuate discrimination based on sensitive features such as gender or race. Given that such discrimination can often be attributed to historical data biases, an elimination or at least mitigation is desirable. With the shift from more traditional models to machine-learning based predictions, calls for greater mitigation have grown anew, as simply excluding sensitive variables in the pricing process can be shown to be ineffective. In this article, we first investigate why predictions are a necessity within the industry and why correcting biases is not as straightforward as simply identifying a sensitive variable. We then propose to ease the biases through the use of Wasserstein barycenters instead of simple scaling. To demonstrate the effects and effectiveness of the approach we employ it on real data and discuss its implications.
    Efficient Learning of Locomotion Skills through the Discovery of Diverse Environmental Trajectory Generator Priors. (arXiv:2210.04819v2 [cs.NE] UPDATED)
    Data-driven learning based methods have recently been particularly successful at learning robust locomotion controllers for a variety of unstructured terrains. Prior work has shown that incorporating good locomotion priors in the form of trajectory generators (TGs) is effective at efficiently learning complex locomotion skills. However, defining a good, single TG as tasks/environments become increasingly more complex remains a challenging problem as it requires extensive tuning and risks reducing the effectiveness of the prior. In this paper, we present Evolved Environmental Trajectory Generators (EETG), a method that learns a diverse set of specialised locomotion priors using Quality-Diversity algorithms while maintaining a single policy within the Policies Modulating TG (PMTG) architecture. The results demonstrate that EETG enables a quadruped robot to successfully traverse a wide range of environments, such as slopes, stairs, rough terrain, and balance beams. Our experiments show that learning a diverse set of specialized TG priors is significantly (5 times) more efficient than using a single, fixed prior when dealing with a wide range of environments.
    FuXi: A cascade machine learning forecasting system for 15-day global weather forecast. (arXiv:2306.12873v1 [physics.ao-ph])
    Over the past few years, due to the rapid development of machine learning (ML) models for weather forecasting, state-of-the-art ML models have shown superior performance compared to the European Centre for Medium-Range Weather Forecasts (ECMWF)'s high-resolution forecast (HRES) in 10-day forecasts at a spatial resolution of 0.25 degree. However, the challenge remains to perform comparably to the ECMWF ensemble mean (EM) in 15-day forecasts. Previous studies have demonstrated the importance of mitigating the accumulation of forecast errors for effective long-term forecasts. Despite numerous efforts to reduce accumulation errors, including autoregressive multi-time step loss, using a single model is found to be insufficient to achieve optimal performance in both short and long lead times. Therefore, we present FuXi, a cascaded ML weather forecasting system that provides 15-day global forecasts with a temporal resolution of 6 hours and a spatial resolution of 0.25 degree. FuXi is developed using 39 years of the ECMWF ERA5 reanalysis dataset. The performance evaluation, based on latitude-weighted root mean square error (RMSE) and anomaly correlation coefficient (ACC), demonstrates that FuXi has comparable forecast performance to ECMWF EM in 15-day forecasts, making FuXi the first ML-based weather forecasting system to accomplish this achievement.
    DEPAC: a Corpus for Depression and Anxiety Detection from Speech. (arXiv:2306.12443v1 [eess.AS])
    Mental distress like depression and anxiety contribute to the largest proportion of the global burden of diseases. Automated diagnosis systems of such disorders, empowered by recent innovations in Artificial Intelligence, can pave the way to reduce the sufferings of the affected individuals. Development of such systems requires information-rich and balanced corpora. In this work, we introduce a novel mental distress analysis audio dataset DEPAC, labeled based on established thresholds on depression and anxiety standard screening tools. This large dataset comprises multiple speech tasks per individual, as well as relevant demographic information. Alongside, we present a feature set consisting of hand-curated acoustic and linguistic features, which were found effective in identifying signs of mental illnesses in human speech. Finally, we justify the quality and effectiveness of our proposed audio corpus and feature set in predicting depression severity by comparing the performance of baseline machine learning models built on this dataset with baseline models trained on other well-known depression corpora.
    The Power of Linear Combinations: Learning with Random Convolutions. (arXiv:2301.11360v2 [cs.CV] UPDATED)
    Following the traditional paradigm of convolutional neural networks (CNNs), modern CNNs manage to keep pace with more recent, for example transformer-based, models by not only increasing model depth and width but also the kernel size. This results in large amounts of learnable model parameters that need to be handled during training. While following the convolutional paradigm with the according spatial inductive bias, we question the significance of \emph{learned} convolution filters. In fact, our findings demonstrate that many contemporary CNN architectures can achieve high test accuracies without ever updating randomly initialized (spatial) convolution filters. Instead, simple linear combinations (implemented through efficient $1\times 1$ convolutions) suffice to effectively recombine even random filters into expressive network operators. Furthermore, these combinations of random filters can implicitly regularize the resulting operations, mitigating overfitting and enhancing overall performance and robustness. Conversely, retaining the ability to learn filter updates can impair network performance. Lastly, although we only observe relatively small gains from learning $3\times 3$ convolutions, the learning gains increase proportionally with kernel size, owing to the non-idealities of the independent and identically distributed (\textit{i.i.d.}) nature of default initialization techniques.
    Breaking the Curse of Multiagents in a Large State Space: RL in Markov Games with Independent Linear Function Approximation. (arXiv:2302.03673v3 [cs.LG] UPDATED)
    We propose a new model, independent linear Markov game, for multi-agent reinforcement learning with a large state space and a large number of agents. This is a class of Markov games with independent linear function approximation, where each agent has its own function approximation for the state-action value functions that are marginalized by other players' policies. We design new algorithms for learning the Markov coarse correlated equilibria (CCE) and Markov correlated equilibria (CE) with sample complexity bounds that only scale polynomially with each agent's own function class complexity, thus breaking the curse of multiagents. In contrast, existing works for Markov games with function approximation have sample complexity bounds scale with the size of the \emph{joint action space} when specialized to the canonical tabular Markov game setting, which is exponentially large in the number of agents. Our algorithms rely on two key technical innovations: (1) utilizing policy replay to tackle non-stationarity incurred by multiple agents and the use of function approximation; (2) separating learning Markov equilibria and exploration in the Markov games, which allows us to use the full-information no-regret learning oracle instead of the stronger bandit-feedback no-regret learning oracle used in the tabular setting. Furthermore, we propose an iterative-best-response type algorithm that can learn pure Markov Nash equilibria in independent linear Markov potential games. In the tabular case, by adapting the policy replay mechanism for independent linear Markov games, we propose an algorithm with $\widetilde{O}(\epsilon^{-2})$ sample complexity to learn Markov CCE, which improves the state-of-the-art result $\widetilde{O}(\epsilon^{-3})$ in Daskalakis et al. 2022, where $\epsilon$ is the desired accuracy, and also significantly improves other problem parameters.
    Sum-Rate Maximization of RSMA-based Aerial Communications with Energy Harvesting: A Reinforcement Learning Approach. (arXiv:2306.12977v1 [cs.IT])
    In this letter, we investigate a joint power and beamforming design problem for rate-splitting multiple access (RSMA)-based aerial communications with energy harvesting, where a self-sustainable aerial base station serves multiple users by utilizing the harvested energy. Considering maximizing the sum-rate from the long-term perspective, we utilize a deep reinforcement learning (DRL) approach, namely the soft actor-critic algorithm, to restrict the maximum transmission power at each time based on the stochastic property of the channel environment, harvested energy, and battery power information. Moreover, for designing precoders and power allocation among all the private/common streams of the RSMA, we employ sequential least squares programming (SLSQP) using the Han-Powell quasi-Newton method to maximize the sum-rate for the given transmission power via DRL. Numerical results show the superiority of the proposed scheme over several baseline methods in terms of the average sum-rate performance.
    Towards Explainable Evaluation Metrics for Machine Translation. (arXiv:2306.13041v1 [cs.CL])
    Unlike classical lexical overlap metrics such as BLEU, most current evaluation metrics for machine translation (for example, COMET or BERTScore) are based on black-box large language models. They often achieve strong correlations with human judgments, but recent research indicates that the lower-quality classical metrics remain dominant, one of the potential reasons being that their decision processes are more transparent. To foster more widespread acceptance of novel high-quality metrics, explainability thus becomes crucial. In this concept paper, we identify key properties as well as key goals of explainable machine translation metrics and provide a comprehensive synthesis of recent techniques, relating them to our established goals and properties. In this context, we also discuss the latest state-of-the-art approaches to explainable metrics based on generative models such as ChatGPT and GPT4. Finally, we contribute a vision of next-generation approaches, including natural language explanations. We hope that our work can help catalyze and guide future research on explainable evaluation metrics and, mediately, also contribute to better and more transparent machine translation systems.
    StrainNet: Predicting crystal structure elastic properties using SE(3)-equivariant graph neural networks. (arXiv:2306.12818v1 [cond-mat.dis-nn])
    Accurately predicting the elastic properties of crystalline solids is vital for computational materials science. However, traditional atomistic scale ab initio approaches are computationally intensive, especially for studying complex materials with a large number of atoms in a unit cell. We introduce a novel data-driven approach to efficiently predict the elastic properties of crystal structures using SE(3)-equivariant graph neural networks (GNNs). This approach yields important scalar elastic moduli with the accuracy comparable to recent data-driven studies. Importantly, our symmetry-aware GNNs model also enables the prediction of the strain energy density (SED) and the associated elastic constants, the fundamental tensorial quantities that are significantly influenced by a material's crystallographic group. The model consistently distinguishes independent elements of SED tensors, in accordance with the symmetry of the crystal structures. Finally, our deep learning model possesses meaningful latent features, offering an interpretable prediction of the elastic properties.
    In Situ Framework for Coupling Simulation and Machine Learning with Application to CFD. (arXiv:2306.12900v1 [cs.LG])
    Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations. As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks. Additionally, performing inference at runtime requires non-trivial coupling of ML framework libraries with simulation codes. This work offers a solution to both limitations by simplifying this coupling and enabling in situ training and inference workflows on heterogeneous clusters. Leveraging SmartSim, the presented framework deploys a database to store data and ML models in memory, thus circumventing the file system. On the Polaris supercomputer, we demonstrate perfect scaling efficiency to the full machine size of the data transfer and inference costs thanks to a novel co-located deployment of the database. Moreover, we train an autoencoder in situ from a turbulent flow simulation, showing that the framework overhead is negligible relative to a solver time step and training epoch.
    Transferable Curricula through Difficulty Conditioned Generators. (arXiv:2306.13028v1 [cs.AI])
    Advancements in reinforcement learning (RL) have demonstrated superhuman performance in complex tasks such as Starcraft, Go, Chess etc. However, knowledge transfer from Artificial "Experts" to humans remain a significant challenge. A promising avenue for such transfer would be the use of curricula. Recent methods in curricula generation focuses on training RL agents efficiently, yet such methods rely on surrogate measures to track student progress, and are not suited for training robots in the real world (or more ambitiously humans). In this paper, we introduce a method named Parameterized Environment Response Model (PERM) that shows promising results in training RL agents in parameterized environments. Inspired by Item Response Theory, PERM seeks to model difficulty of environments and ability of RL agents directly. Given that RL agents and humans are trained more efficiently under the "zone of proximal development", our method generates a curriculum by matching the difficulty of an environment to the current ability of the student. In addition, PERM can be trained offline and does not employ non-stationary measures of student ability, making it suitable for transfer between students. We demonstrate PERM's ability to represent the environment parameter space, and training with RL agents with PERM produces a strong performance in deterministic environments. Lastly, we show that our method is transferable between students, without any sacrifice in training quality.
    Otter-Knowledge: benchmarks of multimodal knowledge graph representation learning from different sources for drug discovery. (arXiv:2306.12802v1 [cs.LG])
    Recent research in representation learning utilizes large databases of proteins or molecules to acquire knowledge of drug and protein structures through unsupervised learning techniques. These pre-trained representations have proven to significantly enhance the accuracy of subsequent tasks, such as predicting the affinity between drugs and target proteins. In this study, we demonstrate that by incorporating knowledge graphs from diverse sources and modalities into the sequences or SMILES representation, we can further enrich the representation and achieve state-of-the-art results on established benchmark datasets. We provide preprocessed and integrated data obtained from 7 public sources, which encompass over 30M triples. Additionally, we make available the pre-trained models based on this data, along with the reported outcomes of their performance on three widely-used benchmark datasets for drug-target binding affinity prediction found in the Therapeutic Data Commons (TDC) benchmarks. Additionally, we make the source code for training models on benchmark datasets publicly available. Our objective in releasing these pre-trained models, accompanied by clean data for model pretraining and benchmark results, is to encourage research in knowledge-enhanced representation learning.
    Towards Antisymmetric Neural Ansatz Separation. (arXiv:2208.03264v3 [cs.LG] UPDATED)
    We study separations between two fundamental models (or \emph{Ans\"atze}) of antisymmetric functions, that is, functions $f$ of the form $f(x_{\sigma(1)}, \ldots, x_{\sigma(N)}) = \text{sign}(\sigma)f(x_1, \ldots, x_N)$, where $\sigma$ is any permutation. These arise in the context of quantum chemistry, and are the basic modeling tool for wavefunctions of Fermionic systems. Specifically, we consider two popular antisymmetric Ans\"atze: the Slater representation, which leverages the alternating structure of determinants, and the Jastrow ansatz, which augments Slater determinants with a product by an arbitrary symmetric function. We construct an antisymmetric function in $N$ dimensions that can be efficiently expressed in Jastrow form, yet provably cannot be approximated by Slater determinants unless there are exponentially (in $N^2$) many terms. This represents the first explicit quantitative separation between these two Ans\"atze.
    The quantum cost function concentration dependency on the parametrization expressivity. (arXiv:2301.06883v2 [quant-ph] UPDATED)
    Although we are currently in the era of noisy intermediate scale quantum devices, several studies are being conducted with the aim of bringing machine learning to the quantum domain. Currently, quantum variational circuits are one of the main strategies used to build such models. However, despite its widespread use, we still do not know what are the minimum resources needed to create a quantum machine learning model. In this article, we analyze how the expressiveness of the parametrization affects the cost function. We analytically show that the more expressive the parametrization is, the more the cost function will tend to concentrate around a value that depends both on the chosen observable and on the number of qubits used. For this, we initially obtain a relationship between the expressiveness of the parametrization and the mean value of the cost function. Afterwards, we relate the expressivity of the parametrization with the variance of the cost function. Finally, we show some numerical simulation results that confirm our theoretical-analytical predictions. To the best of our knowledge, this is the first time that these two important aspects of quantum neural networks are explicitly connected.
    Algorithmic decision making methods for fair credit scoring. (arXiv:2209.07912v3 [cs.LG] UPDATED)
    The effectiveness of machine learning in evaluating the creditworthiness of loan applicants has been demonstrated for a long time. However, there is concern that the use of automated decision-making processes may result in unequal treatment of groups or individuals, potentially leading to discriminatory outcomes. This paper seeks to address this issue by evaluating the effectiveness of 12 leading bias mitigation methods across 5 different fairness metrics, as well as assessing their accuracy and potential profitability for financial institutions. Through our analysis, we have identified the challenges associated with achieving fairness while maintaining accuracy and profitabiliy, and have highlighted both the most successful and least successful mitigation methods. Ultimately, our research serves to bridge the gap between experimental machine learning and its practical applications in the finance industry.
    An Interactive Interface for Novel Class Discovery in Tabular Data. (arXiv:2306.12919v1 [cs.LG])
    Novel Class Discovery (NCD) is the problem of trying to discover novel classes in an unlabeled set, given a labeled set of different but related classes. The majority of NCD methods proposed so far only deal with image data, despite tabular data being among the most widely used type of data in practical applications. To interpret the results of clustering or NCD algorithms, data scientists need to understand the domain- and application-specific attributes of tabular data. This task is difficult and can often only be performed by a domain expert. Therefore, this interface allows a domain expert to easily run state-of-the-art algorithms for NCD in tabular data. With minimal knowledge in data science, interpretable results can be generated.
    Adaptive Bernstein Change Detector for High-Dimensional Data Streams. (arXiv:2306.12974v1 [cs.LG])
    Change detection is of fundamental importance when analyzing data streams. Detecting changes both quickly and accurately enables monitoring and prediction systems to react, e.g., by issuing an alarm or by updating a learning algorithm. However, detecting changes is challenging when observations are high-dimensional. In high-dimensional data, change detectors should not only be able to identify when changes happen, but also in which subspace they occur. Ideally, one should also quantify how severe they are. Our approach, ABCD, has these properties. ABCD learns an encoder-decoder model and monitors its accuracy over a window of adaptive size. ABCD derives a change score based on Bernstein's inequality to detect deviations in terms of accuracy, which indicate changes. Our experiments demonstrate that ABCD outperforms its best competitor by at least 8% and up to 23% in F1-score on average. It can also accurately estimate changes' subspace, together with a severity measure that correlates with the ground truth.
    Context-lumpable stochastic bandits. (arXiv:2306.13053v1 [cs.LG])
    We consider a contextual bandit problem with $S $ contexts and $A $ actions. In each round $t=1,2,\dots$ the learner observes a random context and chooses an action based on its past experience. The learner then observes a random reward whose mean is a function of the context and the action for the round. Under the assumption that the contexts can be lumped into $r\le \min\{S ,A \}$ groups such that the mean reward for the various actions is the same for any two contexts that are in the same group, we give an algorithm that outputs an $\epsilon$-optimal policy after using at most $\widetilde O(r (S +A )/\epsilon^2)$ samples with high probability and provide a matching $\widetilde\Omega(r (S +A )/\epsilon^2)$ lower bound. In the regret minimization setting, we give an algorithm whose cumulative regret up to time $T$ is bounded by $\widetilde O(\sqrt{r^3(S +A )T})$. To the best of our knowledge, we are the first to show the near-optimal sample complexity in the PAC setting and $\widetilde O(\sqrt{{poly}(r)(S+K)T})$ minimax regret in the online setting for this problem. We also show our algorithms can be applied to more general low-rank bandits and get improved regret bounds in some scenarios.
    Efficient Sensor Placement from Regression with Sparse Gaussian Processes in Continuous and Discrete Spaces. (arXiv:2303.00028v2 [cs.RO] UPDATED)
    We present a novel approach based on sparse Gaussian processes (SGPs) to address the sensor placement problem for monitoring spatially (or spatiotemporally) correlated phenomena such as temperature and precipitation. Existing Gaussian process (GP) based sensor placement approaches use GPs with known kernel function parameters to model a phenomenon and subsequently optimize the sensor locations in a discretized representation of the environment. In our approach, we fit an SGP with known kernel function parameters to randomly sampled unlabeled locations in the environment and show that the learned inducing points of the SGP inherently solve the sensor placement problem in continuous spaces. Using SGPs avoids discretizing the environment and reduces the computation cost from cubic to linear complexity. When restricted to a candidate set of sensor placement locations, we can use greedy sequential selection algorithms on the SGP's optimization bound to find good solutions. We also present an approach to efficiently map our continuous space solutions to discrete solution spaces using the assignment problem, which gives us discrete sensor placements optimized in unison. Moreover, we generalize our approach to model sensors with non-point field-of-view and integrated observations by leveraging the inherent properties of GPs and SGPs. Our experimental results on three real-world datasets show that our approaches generate solution placements that result in reconstruction quality that is consistently on par or better than the prior state-of-the-art approach while being significantly faster. Our computationally efficient approaches will enable both large-scale sensor placement, and fast sensor placement for informative path planning problems.
    Prior Density Learning in Variational Bayesian Phylogenetic Parameters Inference. (arXiv:2302.02522v2 [q-bio.PE] UPDATED)
    The advances in variational inference are providing promising paths in Bayesian estimation problems. These advances make variational phylogenetic inference an alternative approach to Markov Chain Monte Carlo methods for approximating the phylogenetic posterior. However, one of the main drawbacks of such approaches is the modelling of the prior through fixed distributions, which could bias the posterior approximation if they are distant from the current data distribution. In this paper, we propose an approach and an implementation framework to relax the rigidity of the prior densities by learning their parameters using a gradient-based method and a neural network-based parameterization. We applied this approach for branch lengths and evolutionary parameters estimation under several Markov chain substitution models. The results of performed simulations show that the approach is powerful in estimating branch lengths and evolutionary model parameters. They also show that a flexible prior model could provide better results than a predefined prior model. Finally, the results highlight that using neural networks improves the initialization of the optimization of the prior density parameters.
    Robust Semantic Segmentation: Strong Adversarial Attacks and Fast Training of Robust Models. (arXiv:2306.12941v1 [cs.CV])
    While a large amount of work has focused on designing adversarial attacks against image classifiers, only a few methods exist to attack semantic segmentation models. We show that attacking segmentation models presents task-specific challenges, for which we propose novel solutions. Our final evaluation protocol outperforms existing methods, and shows that those can overestimate the robustness of the models. Additionally, so far adversarial training, the most successful way for obtaining robust image classifiers, could not be successfully applied to semantic segmentation. We argue that this is because the task to be learned is more challenging, and requires significantly higher computational effort than for image classification. As a remedy, we show that by taking advantage of recent advances in robust ImageNet classifiers, one can train adversarially robust segmentation models at limited computational cost by fine-tuning robust backbones.
    A prior regularized full waveform inversion using generative diffusion models. (arXiv:2306.12776v1 [physics.geo-ph])
    Full waveform inversion (FWI) has the potential to provide high-resolution subsurface model estimations. However, due to limitations in observation, e.g., regional noise, limited shots or receivers, and band-limited data, it is hard to obtain the desired high-resolution model with FWI. To address this challenge, we propose a new paradigm for FWI regularized by generative diffusion models. Specifically, we pre-train a diffusion model in a fully unsupervised manner on a prior velocity model distribution that represents our expectations of the subsurface and then adapt it to the seismic observations by incorporating the FWI into the sampling process of the generative diffusion models. What makes diffusion models uniquely appropriate for such an implementation is that the generative process retains the form and dimensions of the velocity model. Numerical examples demonstrate that our method can outperform the conventional FWI with only negligible additional computational cost. Even in cases of very sparse observations or observations with strong noise, the proposed method could still reconstruct a high-quality subsurface model. Thus, we can incorporate our prior expectations of the solutions in an efficient manner. We further test this approach on field data, which demonstrates the effectiveness of the proposed method.
    An efficient and straightforward online quantization method for a data stream through remove-birth updating. (arXiv:2306.12574v1 [cs.LG])
    The growth of network-connected devices is creating an explosion of data, known as big data, and posing significant challenges to efficient data analysis. This data is generated continuously, creating a dynamic flow known as a data stream. The characteristics of a data stream may change dynamically, and this change is known as concept drift. Consequently, a method for handling data streams must efficiently reduce their volume while dynamically adapting to these changing characteristics. This paper proposes a simple online vector quantization method for concept drift. The proposed method identifies and replaces units with low win probability through remove-birth updating, thus achieving a rapid adaptation to concept drift. Furthermore, the results of this study show that the proposed method can generate minimal dead units even in the presence of concept drift. This study also suggests that some metrics calculated from the proposed method will be helpful for drift detection.
    Towards More Realistic Membership Inference Attacks on Large Diffusion Models. (arXiv:2306.12983v1 [cs.LG])
    Generative diffusion models, including Stable Diffusion and Midjourney, can generate visually appealing, diverse, and high-resolution images for various applications. These models are trained on billions of internet-sourced images, raising significant concerns about the potential unauthorized use of copyright-protected images. In this paper, we examine whether it is possible to determine if a specific image was used in the training set, a problem known in the cybersecurity community and referred to as a membership inference attack. Our focus is on Stable Diffusion, and we address the challenge of designing a fair evaluation framework to answer this membership question. We propose a methodology to establish a fair evaluation setup and apply it to Stable Diffusion, enabling potential extensions to other generative models. Utilizing this evaluation setup, we execute membership attacks (both known and newly introduced). Our research reveals that previously proposed evaluation setups do not provide a full understanding of the effectiveness of membership inference attacks. We conclude that the membership inference attack remains a significant challenge for large diffusion models (often deployed as black-box systems), indicating that related privacy and copyright issues will persist in the foreseeable future.
    Toward Leveraging Pre-Trained Self-Supervised Frontends for Automatic Singing Voice Understanding Tasks: Three Case Studies. (arXiv:2306.12714v1 [cs.SD])
    Automatic singing voice understanding tasks, such as singer identification, singing voice transcription, and singing technique classification, benefit from data-driven approaches that utilize deep learning techniques. These approaches work well even under the rich diversity of vocal and noisy samples owing to their representation ability. However, the limited availability of labeled data remains a significant obstacle to achieving satisfactory performance. In recent years, self-supervised learning models (SSL models) have been trained using large amounts of unlabeled data in the field of speech processing and music classification. By fine-tuning these models for the target tasks, comparable performance to conventional supervised learning can be achieved with limited training data. Therefore, in this paper, we investigate the effectiveness of SSL models for various singing voice recognition tasks. We report the results of experiments comparing SSL models for three different tasks (i.e., singer identification, singing voice transcription, and singing technique classification) as initial exploration and aim to discuss these findings. Experimental results show that each SSL model achieves comparable performance and sometimes outperforms compared to state-of-the-art methods on each task. We also conducted a layer-wise analysis to further understand the behavior of the SSL models.
    Concept-aware clustering for decentralized deep learning under temporal shift. (arXiv:2306.12768v1 [cs.LG])
    Decentralized deep learning requires dealing with non-iid data across clients, which may also change over time due to temporal shifts. While non-iid data has been extensively studied in distributed settings, temporal shifts have received no attention. To the best of our knowledge, we are first with tackling the novel and challenging problem of decentralized learning with non-iid and dynamic data. We propose a novel algorithm that can automatically discover and adapt to the evolving concepts in the network, without any prior knowledge or estimation of the number of concepts. We evaluate our algorithm on standard benchmark datasets and demonstrate that it outperforms previous methods for decentralized learning.
    Accelerated Training via Incrementally Growing Neural Networks using Variance Transfer and Learning Rate Adaptation. (arXiv:2306.12700v1 [cs.LG])
    We develop an approach to efficiently grow neural networks, within which parameterization and optimization strategies are designed by considering their effects on the training dynamics. Unlike existing growing methods, which follow simple replication heuristics or utilize auxiliary gradient-based local optimization, we craft a parameterization scheme which dynamically stabilizes weight, activation, and gradient scaling as the architecture evolves, and maintains the inference functionality of the network. To address the optimization difficulty resulting from imbalanced training effort distributed to subnetworks fading in at different growth phases, we propose a learning rate adaption mechanism that rebalances the gradient contribution of these separate subcomponents. Experimental results show that our method achieves comparable or better accuracy than training large fixed-size models, while saving a substantial portion of the original computation budget for training. We demonstrate that these gains translate into real wall-clock training speedups.
    Instruct-FinGPT: Financial Sentiment Analysis by Instruction Tuning of General-Purpose Large Language Models. (arXiv:2306.12659v1 [cs.CL])
    Sentiment analysis is a vital tool for uncovering insights from financial articles, news, and social media, shaping our understanding of market movements. Despite the impressive capabilities of large language models (LLMs) in financial natural language processing (NLP), they still struggle with accurately interpreting numerical values and grasping financial context, limiting their effectiveness in predicting financial sentiment. In this paper, we introduce a simple yet effective instruction tuning approach to address these issues. By transforming a small portion of supervised financial sentiment analysis data into instruction data and fine-tuning a general-purpose LLM with this method, we achieve remarkable advancements in financial sentiment analysis. In the experiment, our approach outperforms state-of-the-art supervised sentiment analysis models, as well as widely used LLMs like ChatGPT and LLaMAs, particularly in scenarios where numerical understanding and contextual comprehension are vital.
    Hierarchical Neural Simulation-Based Inference Over Event Ensembles. (arXiv:2306.12584v1 [stat.ML])
    When analyzing real-world data it is common to work with event ensembles, which comprise sets of observations that collectively constrain the parameters of an underlying model of interest. Such models often have a hierarchical structure, where "local" parameters impact individual events and "global" parameters influence the entire dataset. We introduce practical approaches for optimal dataset-wide probabilistic inference in cases where the likelihood is intractable, but simulations can be realized via forward modeling. We construct neural estimators for the likelihood(-ratio) or posterior and show that explicitly accounting for the model's hierarchical structure can lead to tighter parameter constraints. We ground our discussion using case studies from the physical sciences, focusing on examples from particle physics (particle collider data) and astrophysics (strong gravitational lensing observations).
    MP3: Movement Primitive-Based (Re-)Planning Policy. (arXiv:2306.12729v1 [cs.LG])
    We introduce a novel deep reinforcement learning (RL) approach called Movement Prmitive-based Planning Policy (MP3). By integrating movement primitives (MPs) into the deep RL framework, MP3 enables the generation of smooth trajectories throughout the whole learning process while effectively learning from sparse and non-Markovian rewards. Additionally, MP3 maintains the capability to adapt to changes in the environment during execution. Although many early successes in robot RL have been achieved by combining RL with MPs, these approaches are often limited to learning single stroke-based motions, lacking the ability to adapt to task variations or adjust motions during execution. Building upon our previous work, which introduced an episode-based RL method for the non-linear adaptation of MP parameters to different task variations, this paper extends the approach to incorporating replanning strategies. This allows adaptation of the MP parameters throughout motion execution, addressing the lack of online motion adaptation in stochastic domains requiring feedback. We compared our approach against state-of-the-art deep RL and RL with MPs methods. The results demonstrated improved performance in sophisticated, sparse reward settings and in domains requiring replanning.
    Multi-Objective Hull Form Optimization with CAD Engine-based Deep Learning Physics for 3D Flow Prediction. (arXiv:2306.12915v1 [cs.LG])
    In this work, we propose a built-in Deep Learning Physics Optimization (DLPO) framework to set up a shape optimization study of the Duisburg Test Case (DTC) container vessel. We present two different applications: (1) sensitivity analysis to detect the most promising generic basis hull shapes, and (2) multi-objective optimization to quantify the trade-off between optimal hull forms. DLPO framework allows for the evaluation of design iterations automatically in an end-to-end manner. We achieved these results by coupling Extrality's Deep Learning Physics (DLP) model to a CAD engine and an optimizer. Our proposed DLP model is trained on full 3D volume data coming from RANS simulations, and it can provide accurate and high-quality 3D flow predictions in real-time, which makes it a good evaluator to perform optimization of new container vessel designs w.r.t the hydrodynamic efficiency. In particular, it is able to recover the forces acting on the vessel by integration on the hull surface with a mean relative error of 3.84\% \pm 2.179\% on the total resistance. Each iteration takes only 20 seconds, thus leading to a drastic saving of time and engineering efforts, while delivering valuable insight into the performance of the vessel, including RANS-like detailed flow information. We conclude that DLPO framework is a promising tool to accelerate the ship design process and lead to more efficient ships with better hydrodynamic performance.
    Class-Incremental Learning based on Label Generation. (arXiv:2306.12619v1 [cs.CL])
    Despite the great success of pre-trained language models, it is still a challenge to use these models for continual learning, especially for the class-incremental learning (CIL) setting due to catastrophic forgetting (CF). This paper reports our finding that if we formulate CIL as a continual label generation problem, CF is drastically reduced and the generalizable representations of pre-trained models can be better retained. We thus propose a new CIL method (VAG) that also leverages the sparsity of vocabulary to focus the generation and creates pseudo-replay samples by using label semantics. Experimental results show that VAG outperforms baselines by a large margin.
    RobustNeuralNetworks.jl: a Package for Machine Learning and Data-Driven Control with Certified Robustness. (arXiv:2306.12612v1 [cs.LG])
    Neural networks are typically sensitive to small input perturbations, leading to unexpected or brittle behaviour. We present RobustNeuralNetworks.jl: a Julia package for neural network models that are constructed to naturally satisfy a set of user-defined robustness constraints. The package is based on the recently proposed Recurrent Equilibrium Network (REN) and Lipschitz-Bounded Deep Network (LBDN) model classes, and is designed to interface directly with Julia's most widely-used machine learning package, Flux.jl. We discuss the theory behind our model parameterization, give an overview of the package, and provide a tutorial demonstrating its use in image classification, reinforcement learning, and nonlinear state-observer design.
    Towards quantum enhanced adversarial robustness in machine learning. (arXiv:2306.12688v1 [quant-ph])
    Machine learning algorithms are powerful tools for data driven tasks such as image classification and feature detection, however their vulnerability to adversarial examples - input samples manipulated to fool the algorithm - remains a serious challenge. The integration of machine learning with quantum computing has the potential to yield tools offering not only better accuracy and computational efficiency, but also superior robustness against adversarial attacks. Indeed, recent work has employed quantum mechanical phenomena to defend against adversarial attacks, spurring the rapid development of the field of quantum adversarial machine learning (QAML) and potentially yielding a new source of quantum advantage. Despite promising early results, there remain challenges towards building robust real-world QAML tools. In this review we discuss recent progress in QAML and identify key challenges. We also suggest future research directions which could determine the route to practicality for QAML approaches as quantum computing hardware scales up and noise levels are reduced.
    Finite-time Lyapunov exponents of deep neural networks. (arXiv:2306.12548v1 [cond-mat.dis-nn])
    We compute how small input perturbations affect the output of deep neural networks, exploring an analogy between deep networks and dynamical systems, where the growth or decay of local perturbations is characterised by finite-time Lyapunov exponents. We show that the maximal exponent forms geometrical structures in input space, akin to coherent structures in dynamical systems. Ridges of large positive exponents divide input space into different regions that the network associates with different classes. These ridges visualise the geometry that deep networks construct in input space, shedding light on the fundamental mechanisms underlying their learning capabilities.
    Identifying and Disentangling Spurious Features in Pretrained Image Representations. (arXiv:2306.12673v1 [cs.LG])
    Neural networks employ spurious correlations in their predictions, resulting in decreased performance when these correlations do not hold. Recent works suggest fixing pretrained representations and training a classification head that does not use spurious features. We investigate how spurious features are represented in pretrained representations and explore strategies for removing information about spurious features. Considering the Waterbirds dataset and a few pretrained representations, we find that even with full knowledge of spurious features, their removal is not straightforward due to entangled representation. To address this, we propose a linear autoencoder training method to separate the representation into core, spurious, and other features. We propose two effective spurious feature removal approaches that are applied to the encoding and significantly improve classification performance measured by worst group accuracy.
    Pure Exploration in Bandits with Linear Constraints. (arXiv:2306.12774v1 [cs.LG])
    We address the problem of identifying the optimal policy with a fixed confidence level in a multi-armed bandit setup, when \emph{the arms are subject to linear constraints}. Unlike the standard best-arm identification problem which is well studied, the optimal policy in this case may not be deterministic and could mix between several arms. This changes the geometry of the problem which we characterize via an information-theoretic lower bound. We introduce two asymptotically optimal algorithms for this setting, one based on the Track-and-Stop method and the other based on a game-theoretic approach. Both these algorithms try to track an optimal allocation based on the lower bound and computed by a weighted projection onto the boundary of a normal cone. Finally, we provide empirical results that validate our bounds and visualize how constraints change the hardness of the problem.
    Vec2Vec: A Compact Neural Network Approach for Transforming Text Embeddings with High Fidelity. (arXiv:2306.12689v1 [cs.CL])
    Vector embeddings have become ubiquitous tools for many language-related tasks. A leading embedding model is OpenAI's text-ada-002 which can embed approximately 6,000 words into a 1,536-dimensional vector. While powerful, text-ada-002 is not open source and is only available via API. We trained a simple neural network to convert open-source 768-dimensional MPNet embeddings into text-ada-002 embeddings. We compiled a subset of 50,000 online food reviews. We calculated MPNet and text-ada-002 embeddings for each review and trained a simple neural network to for 75 epochs. The neural network was designed to predict the corresponding text-ada-002 embedding for a given MPNET embedding. Our model achieved an average cosine similarity of 0.932 on 10,000 unseen reviews in our held-out test dataset. We manually assessed the quality of our predicted embeddings for vector search over text-ada-002-embedded reviews. While not as good as real text-ada-002 embeddings, predicted embeddings were able to retrieve highly relevant reviews. Our final model, Vec2Vec, is lightweight (<80 MB) and fast. Future steps include training a neural network with a more sophisticated architecture and a larger dataset of paired embeddings to achieve greater performance. The ability to convert between and align embedding spaces may be helpful for interoperability, limiting dependence on proprietary models, protecting data privacy, reducing costs, and offline operations.
    Slimmable Encoders for Flexible Split DNNs in Bandwidth and Resource Constrained IoT Systems. (arXiv:2306.12691v1 [cs.LG])
    The execution of large deep neural networks (DNN) at mobile edge devices requires considerable consumption of critical resources, such as energy, while imposing demands on hardware capabilities. In approaches based on edge computing the execution of the models is offloaded to a compute-capable device positioned at the edge of 5G infrastructures. The main issue of the latter class of approaches is the need to transport information-rich signals over wireless links with limited and time-varying capacity. The recent split computing paradigm attempts to resolve this impasse by distributing the execution of DNN models across the layers of the systems to reduce the amount of data to be transmitted while imposing minimal computing load on mobile devices. In this context, we propose a novel split computing approach based on slimmable ensemble encoders. The key advantage of our design is the ability to adapt computational load and transmitted data size in real-time with minimal overhead and time. This is in contrast with existing approaches, where the same adaptation requires costly context switching and model loading. Moreover, our model outperforms existing solutions in terms of compression efficacy and execution time, especially in the context of weak mobile devices. We present a comprehensive comparison with the most advanced split computing solutions, as well as an experimental evaluation on GPU-less devices.
    Memory-Query Tradeoffs for Randomized Convex Optimization. (arXiv:2306.12534v1 [cs.DS])
    We show that any randomized first-order algorithm which minimizes a $d$-dimensional, $1$-Lipschitz convex function over the unit ball must either use $\Omega(d^{2-\delta})$ bits of memory or make $\Omega(d^{1+\delta/6-o(1)})$ queries, for any constant $\delta\in (0,1)$ and when the precision $\epsilon$ is quasipolynomially small in $d$. Our result implies that cutting plane methods, which use $\tilde{O}(d^2)$ bits of memory and $\tilde{O}(d)$ queries, are Pareto-optimal among randomized first-order algorithms, and quadratic memory is required to achieve optimal query complexity for convex optimization.
    Investigating Poor Performance Regions of Black Boxes: LIME-based Exploration in Sepsis Detection. (arXiv:2306.12507v1 [cs.LG])
    Interpreting machine learning models remains a challenge, hindering their adoption in clinical settings. This paper proposes leveraging Local Interpretable Model-Agnostic Explanations (LIME) to provide interpretable descriptions of black box classification models in high-stakes sepsis detection. By analyzing misclassified instances, significant features contributing to suboptimal performance are identified. The analysis reveals regions where the classifier performs poorly, allowing the calculation of error rates within these regions. This knowledge is crucial for cautious decision-making in sepsis detection and other critical applications. The proposed approach is demonstrated using the eICU dataset, effectively identifying and visualizing regions where the classifier underperforms. By enhancing interpretability, our method promotes the adoption of machine learning models in clinical practice, empowering informed decision-making and mitigating risks in critical scenarios.
    On Exploring Node-feature and Graph-structure Diversities for Node Drop Graph Pooling. (arXiv:2306.12726v1 [cs.LG])
    A pooling operation is essential for effective graph-level representation learning, where the node drop pooling has become one mainstream graph pooling technology. However, current node drop pooling methods usually keep the top-k nodes according to their significance scores, which ignore the graph diversity in terms of the node features and the graph structures, thus resulting in suboptimal graph-level representations. To address the aforementioned issue, we propose a novel plug-and-play score scheme and refer to it as MID, which consists of a \textbf{M}ultidimensional score space with two operations, \textit{i.e.}, fl\textbf{I}pscore and \textbf{D}ropscore. Specifically, the multidimensional score space depicts the significance of nodes through multiple criteria; the flipscore encourages the maintenance of dissimilar node features; and the dropscore forces the model to notice diverse graph structures instead of being stuck in significant local structures. To evaluate the effectiveness of our proposed MID, we perform extensive experiments by applying it to a wide variety of recent node drop pooling methods, including TopKPool, SAGPool, GSAPool, and ASAP. Specifically, the proposed MID can efficiently and consistently achieve about 2.8\% average improvements over the above four methods on seventeen real-world graph classification datasets, including four social datasets (IMDB-BINARY, IMDB-MULTI, REDDIT-BINARY, and COLLAB), and thirteen biochemical datasets (D\&D, PROTEINS, NCI1, MUTAG, PTC-MR, NCI109, ENZYMES, MUTAGENICITY, FRANKENSTEIN, HIV, BBBP, TOXCAST, and TOX21). Code is available at~\url{https://github.com/whuchuang/mid}.
    Generalized Low-Rank Update: Model Parameter Bounds for Low-Rank Training Data Modifications. (arXiv:2306.12670v1 [stat.ML])
    In this study, we have developed an incremental machine learning (ML) method that efficiently obtains the optimal model when a small number of instances or features are added or removed. This problem holds practical importance in model selection, such as cross-validation (CV) and feature selection. Among the class of ML methods known as linear estimators, there exists an efficient model update framework called the low-rank update that can effectively handle changes in a small number of rows and columns within the data matrix. However, for ML methods beyond linear estimators, there is currently no comprehensive framework available to obtain knowledge about the updated solution within a specific computational complexity. In light of this, our study introduces a method called the Generalized Low-Rank Update (GLRU) which extends the low-rank update framework of linear estimators to ML methods formulated as a certain class of regularized empirical risk minimization, including commonly used methods such as SVM and logistic regression. The proposed GLRU method not only expands the range of its applicability but also provides information about the updated solutions with a computational complexity proportional to the amount of dataset changes. To demonstrate the effectiveness of the GLRU method, we conduct experiments showcasing its efficiency in performing cross-validation and feature selection compared to other baseline methods.
    Learning Conditional Instrumental Variable Representation for Causal Effect Estimation. (arXiv:2306.12453v1 [cs.LG])
    One of the fundamental challenges in causal inference is to estimate the causal effect of a treatment on its outcome of interest from observational data. However, causal effect estimation often suffers from the impacts of confounding bias caused by unmeasured confounders that affect both the treatment and the outcome. The instrumental variable (IV) approach is a powerful way to eliminate the confounding bias from latent confounders. However, the existing IV-based estimators require a nominated IV, and for a conditional IV (CIV) the corresponding conditioning set too, for causal effect estimation. This limits the application of IV-based estimators. In this paper, by leveraging the advantage of disentangled representation learning, we propose a novel method, named DVAE.CIV, for learning and disentangling the representations of CIV and the representations of its conditioning set for causal effect estimations from data with latent confounders. Extensive experimental results on both synthetic and real-world datasets demonstrate the superiority of the proposed DVAE.CIV method against the existing causal effect estimators.
    Neural Multigrid Memory For Computational Fluid Dynamics. (arXiv:2306.12545v1 [physics.flu-dyn])
    Turbulent flow simulation plays a crucial role in various applications, including aircraft and ship design, industrial process optimization, and weather prediction. In this paper, we propose an advanced data-driven method for simulating turbulent flow, representing a significant improvement over existing approaches. Our methodology combines the strengths of Video Prediction Transformer (VPTR) (Ye & Bilodeau, 2022) and Multigrid Architecture (MgConv, MgResnet) (Ke et al., 2017). VPTR excels in capturing complex spatiotemporal dependencies and handling large input data, making it a promising choice for turbulent flow prediction. Meanwhile, Multigrid Architecture utilizes multiple grids with different resolutions to capture the multiscale nature of turbulent flows, resulting in more accurate and efficient simulations. Through our experiments, we demonstrate the effectiveness of our proposed approach, named MGxTransformer, in accurately predicting velocity, temperature, and turbulence intensity for incompressible turbulent flows across various geometries and flow conditions. Our results exhibit superior accuracy compared to other baselines, while maintaining computational efficiency.
    Aligning Synthetic Medical Images with Clinical Knowledge using Human Feedback. (arXiv:2306.12438v1 [eess.IV])
    Generative models capable of capturing nuanced clinical features in medical images hold great promise for facilitating clinical data sharing, enhancing rare disease datasets, and efficiently synthesizing annotated medical images at scale. Despite their potential, assessing the quality of synthetic medical images remains a challenge. While modern generative models can synthesize visually-realistic medical images, the clinical validity of these images may be called into question. Domain-agnostic scores, such as FID score, precision, and recall, cannot incorporate clinical knowledge and are, therefore, not suitable for assessing clinical sensibility. Additionally, there are numerous unpredictable ways in which generative models may fail to synthesize clinically plausible images, making it challenging to anticipate potential failures and manually design scores for their detection. To address these challenges, this paper introduces a pathologist-in-the-loop framework for generating clinically-plausible synthetic medical images. Starting with a diffusion model pretrained using real images, our framework comprises three steps: (1) evaluating the generated images by expert pathologists to assess whether they satisfy clinical desiderata, (2) training a reward model that predicts the pathologist feedback on new samples, and (3) incorporating expert knowledge into the diffusion model by using the reward model to inform a finetuning objective. We show that human feedback significantly improves the quality of synthetic images in terms of fidelity, diversity, utility in downstream applications, and plausibility as evaluated by experts.
    Empirical Risk Minimization with Shuffled SGD: A Primal-Dual Perspective and Improved Bounds. (arXiv:2306.12498v1 [math.OC])
    Stochastic gradient descent (SGD) is perhaps the most prevalent optimization method in modern machine learning. Contrary to the empirical practice of sampling from the datasets without replacement and with (possible) reshuffling at each epoch, the theoretical counterpart of SGD usually relies on the assumption of sampling with replacement. It is only very recently that SGD with sampling without replacement -- shuffled SGD -- has been analyzed. For convex finite sum problems with $n$ components and under the $L$-smoothness assumption for each component function, there are matching upper and lower bounds, under sufficiently small -- $\mathcal{O}(\frac{1}{nL})$ -- step sizes. Yet those bounds appear too pessimistic -- in fact, the predicted performance is generally no better than for full gradient descent -- and do not agree with the empirical observations. In this work, to narrow the gap between the theory and practice of shuffled SGD, we sharpen the focus from general finite sum problems to empirical risk minimization with linear predictors. This allows us to take a primal-dual perspective and interpret shuffled SGD as a primal-dual method with cyclic coordinate updates on the dual side. Leveraging this perspective, we prove a fine-grained complexity bound that depends on the data matrix and is never worse than what is predicted by the existing bounds. Notably, our bound can predict much faster convergence than the existing analyses -- by a factor of the order of $\sqrt{n}$ in some cases. We empirically demonstrate that on common machine learning datasets our bound is indeed much tighter. We further show how to extend our analysis to convex nonsmooth problems, with similar improvements.
    Modeling T1 Resting-State MRI Variants Using Convolutional Neural Networks in Diagnosis of OCD. (arXiv:2306.12435v1 [q-bio.NC])
    Obsessive-compulsive disorder (OCD) presents itself as a highly debilitating disorder. The disorder has common associations with the prefrontal cortex and the glutamate receptor known as Metabotropic Glutamate Receptor 5 (mGluR5). This receptor has been observed to demonstrate higher levels of signaling from positron emission tomography scans measured by its distribution volume ratios in mice. Despite this evidence, studies are unable to fully verify the involvement of mGluR5 as more empirical data is needed. Computational modeling methods were used as a means of validation for previous hypotheses involving mGluR5. The inadequacies in relation to the causal factor of OCD were answered by utilizing T1 resting-state magnetic resonance imaging (TRS-MRI) scans of patients suffering from schizophrenia, major depressive disorder, and obsessive-compulsive disorder. Because comorbid cases often occur within these disorders, cross-comparative abilities become necessary to find distinctive characteristics. Two-dimensional convolutional neural networks alongside ResNet50 and MobileNet models were constructed and evaluated for efficiency. Activation heatmaps of TRS-MRI scans were outputted, allowing for transcriptomics analysis. Though, a lack of ability to predict OCD cases prevented gene expression analysis. Across all models, there was an 88.75\% validation accuracy for MDD, and 82.08\% validation accuracy for SZD under the framework of ResNet50 as well as novel computation. OCD yielded an accuracy rate of ~54.4\%. These results provided further evidence for the p factor theorem regarding mental disorders. Future work involves the application of transfer learning to bolster accuracy rates.
    Constant Memory Attention Block. (arXiv:2306.12599v1 [cs.LG])
    Modern foundation model architectures rely on attention mechanisms to effectively capture context. However, these methods require linear or quadratic memory in terms of the number of inputs/datapoints, limiting their applicability in low-compute domains. In this work, we propose Constant Memory Attention Block (CMAB), a novel general-purpose attention block that computes its output in constant memory and performs updates in constant computation. Highlighting CMABs efficacy, we introduce methods for Neural Processes and Temporal Point Processes. Empirically, we show our proposed methods achieve results competitive with state-of-the-art while being significantly more memory efficient.
    Communication-Efficient Federated Learning through Importance Sampling. (arXiv:2306.12625v1 [cs.LG])
    The high communication cost of sending model updates from the clients to the server is a significant bottleneck for scalable federated learning (FL). Among existing approaches, state-of-the-art bitrate-accuracy tradeoffs have been achieved using stochastic compression methods -- in which the client $n$ sends a sample from a client-only probability distribution $q_{\phi^{(n)}}$, and the server estimates the mean of the clients' distributions using these samples. However, such methods do not take full advantage of the FL setup where the server, throughout the training process, has side information in the form of a pre-data distribution $p_{\theta}$ that is close to the client's distribution $q_{\phi^{(n)}}$ in Kullback-Leibler (KL) divergence. In this work, we exploit this closeness between the clients' distributions $q_{\phi^{(n)}}$'s and the side information $p_{\theta}$ at the server, and propose a framework that requires approximately $D_{KL}(q_{\phi^{(n)}}|| p_{\theta})$ bits of communication. We show that our method can be integrated into many existing stochastic compression frameworks such as FedPM, Federated SGLD, and QSGD to attain the same (and often higher) test accuracy with up to $50$ times reduction in the bitrate.
    Outlier-robust Estimation of a Sparse Linear Model Using Invexity. (arXiv:2306.12678v1 [cs.LG])
    In this paper, we study problem of estimating a sparse regression vector with correct support in the presence of outlier samples. The inconsistency of lasso-type methods is well known in this scenario. We propose a combinatorial version of outlier-robust lasso which also identifies clean samples. Subsequently, we use these clean samples to make a good estimation. We also provide a novel invex relaxation for the combinatorial problem and provide provable theoretical guarantees for this relaxation. Finally, we conduct experiments to validate our theory and compare our results against standard lasso.
    Balanced Training of Energy-Based Models with Adaptive Flow Sampling. (arXiv:2306.00684v2 [cs.LG] UPDATED)
    Energy-based models (EBMs) are versatile density estimation models that directly parameterize an unnormalized log density. Although very flexible, EBMs lack a specified normalization constant of the model, making the likelihood of the model computationally intractable. Several approximate samplers and variational inference techniques have been proposed to estimate the likelihood gradients for training. These techniques have shown promising results in generating samples, but little attention has been paid to the statistical accuracy of the estimated density, such as determining the relative importance of different classes in a dataset. In this work, we propose a new maximum likelihood training algorithm for EBMs that uses a different type of generative model, normalizing flows (NF), which have recently been proposed to facilitate sampling. Our method fits an NF to an EBM during training so that an NF-assisted sampling scheme provides an accurate gradient for the EBMs at all times, ultimately leading to a fast sampler for generating new data.
    MPSTAN: Metapopulation-based Spatio-Temporal Attention Network for Epidemic Forecasting. (arXiv:2306.12436v1 [cs.LG])
    Accurate epidemic forecasting plays a vital role for governments in developing effective prevention measures for suppressing epidemics. Most of the present spatio-temporal models cannot provide a general framework for stable, and accurate forecasting of epidemics with diverse evolution trends. Incorporating epidemiological domain knowledge ranging from single-patch to multi-patch into neural networks is expected to improve forecasting accuracy. However, relying solely on single-patch knowledge neglects inter-patch interactions, while constructing multi-patch knowledge is challenging without population mobility data. To address the aforementioned problems, we propose a novel hybrid model called Metapopulation-based Spatio-Temporal Attention Network (MPSTAN). This model aims to improve the accuracy of epidemic forecasting by incorporating multi-patch epidemiological knowledge into a spatio-temporal model and adaptively defining inter-patch interactions. Moreover, we incorporate inter-patch epidemiological knowledge into both the model construction and loss function to help the model learn epidemic transmission dynamics. Extensive experiments conducted on two representative datasets with different epidemiological evolution trends demonstrate that our proposed model outperforms the baselines and provides more accurate and stable short- and long-term forecasting. We confirm the effectiveness of domain knowledge in the learning model and investigate the impact of different ways of integrating domain knowledge on forecasting. We observe that using domain knowledge in both model construction and loss functions leads to more efficient forecasting, and selecting appropriate domain knowledge can improve accuracy further.
    Adversarial Training with Generated Data in High-Dimensional Regression: An Asymptotic Study. (arXiv:2306.12582v1 [stat.ML])
    In recent years, studies such as \cite{carmon2019unlabeled,gowal2021improving,xing2022artificial} have demonstrated that incorporating additional real or generated data with pseudo-labels can enhance adversarial training through a two-stage training approach. In this paper, we perform a theoretical analysis of the asymptotic behavior of this method in high-dimensional linear regression. While a double-descent phenomenon can be observed in ridgeless training, with an appropriate $\mathcal{L}_2$ regularization, the two-stage adversarial training achieves a better performance. Finally, we derive a shortcut cross-validation formula specifically tailored for the two-stage training method.
    Comparing deep learning models for volatility prediction using multivariate data. (arXiv:2306.12446v1 [q-fin.ST])
    This study aims at comparing several deep learning-based forecasters in the task of volatility prediction using multivariate data, proceeding from simpler or shallower to deeper and more complex models and compare them to the naive prediction and variations of classical GARCH models. Specifically, the volatility of five assets (i.e., S\&P500, NASDAQ100, gold, silver, and oil) was predicted with the GARCH models, Multi-Layer Perceptrons, recurrent neural networks, Temporal Convolutional Networks, and the Temporal Fusion Transformer. In most cases the Temporal Fusion Transformer followed by variants of Temporal Convolutional Network outperformed classical approaches and shallow networks. These experiments were repeated, and the difference between competing models was shown to be statistically significant, therefore encouraging their use in practice.
    Verifying Global Neural Network Specifications using Hyperproperties. (arXiv:2306.12495v1 [cs.LG])
    Current approaches to neural network verification focus on specifications that target small regions around known input data points, such as local robustness. Thus, using these approaches, we can not obtain guarantees for inputs that are not close to known inputs. Yet, it is highly likely that a neural network will encounter such truly unseen inputs during its application. We study global specifications that - when satisfied - provide guarantees for all potential inputs. We introduce a hyperproperty formalism that allows for expressing global specifications such as monotonicity, Lipschitz continuity, global robustness, and dependency fairness. Our formalism enables verifying global specifications using existing neural network verification approaches by leveraging capabilities for verifying general computational graphs. Thereby, we extend the scope of guarantees that can be provided using existing methods. Recent success in verifying specific global specifications shows that attaining strong guarantees for all potential data points is feasible.
    Semi-Implicit Denoising Diffusion Models (SIDDMs). (arXiv:2306.12511v1 [cs.LG])
    Despite the proliferation of generative models, achieving fast sampling during inference without compromising sample diversity and quality remains challenging. Existing models such as Denoising Diffusion Probabilistic Models (DDPM) deliver high-quality, diverse samples but are slowed by an inherently high number of iterative steps. The Denoising Diffusion Generative Adversarial Networks (DDGAN) attempted to circumvent this limitation by integrating a GAN model for larger jumps in the diffusion process. However, DDGAN encountered scalability limitations when applied to large datasets. To address these limitations, we introduce a novel approach that tackles the problem by matching implicit and explicit factors. More specifically, our approach involves utilizing an implicit model to match the marginal distributions of noisy data and the explicit conditional distribution of the forward diffusion. This combination allows us to effectively match the joint denoising distributions. Unlike DDPM but similar to DDGAN, we do not enforce a parametric distribution for the reverse step, enabling us to take large steps during inference. Similar to the DDPM but unlike DDGAN, we take advantage of the exact form of the diffusion process. We demonstrate that our proposed method obtains comparable generative performance to diffusion-based models and vastly superior results to models with a small number of sampling steps.
    HypeRS: Building a Hypergraph-driven ensemble Recommender System. (arXiv:2306.12800v1 [cs.IR])
    Recommender systems are designed to predict user preferences over collections of items. These systems process users' previous interactions to decide which items should be ranked higher to satisfy their desires. An ensemble recommender system can achieve great recommendation performance by effectively combining the decisions generated by individual models. In this paper, we propose a novel ensemble recommender system that combines predictions made by different models into a unified hypergraph ranking framework. This is the first time that hypergraph ranking has been employed to model an ensemble of recommender systems. Hypergraphs are generalizations of graphs where multiple vertices can be connected via hyperedges, efficiently modeling high-order relations. We differentiate real and predicted connections between users and items by assigning different hyperedge weights to individual recommender systems. We perform experiments using four datasets from the fields of movie, music and news media recommendation. The obtained results show that the ensemble hypergraph ranking method generates more accurate recommendations compared to the individual models and a weighted hybrid approach. The assignment of different hyperedge weights to the ensemble hypergraph further improves the performance compared to a setting with identical hyperedge weights.
    XAI-TRIS: Non-linear benchmarks to quantify ML explanation performance. (arXiv:2306.12816v1 [cs.LG])
    The field of 'explainable' artificial intelligence (XAI) has produced highly cited methods that seek to make the decisions of complex machine learning (ML) methods 'understandable' to humans, for example by attributing 'importance' scores to input features. Yet, a lack of formal underpinning leaves it unclear as to what conclusions can safely be drawn from the results of a given XAI method and has also so far hindered the theoretical verification and empirical validation of XAI methods. This means that challenging non-linear problems, typically solved by deep neural networks, presently lack appropriate remedies. Here, we craft benchmark datasets for three different non-linear classification scenarios, in which the important class-conditional features are known by design, serving as ground truth explanations. Using novel quantitative metrics, we benchmark the explanation performance of a wide set of XAI methods across three deep learning model architectures. We show that popular XAI methods are often unable to significantly outperform random performance baselines and edge detection methods. Moreover, we demonstrate that explanations derived from different model architectures can be vastly different; thus, prone to misinterpretation even under controlled conditions.
    On the Robustness of Generative Retrieval Models: An Out-of-Distribution Perspective. (arXiv:2306.12756v1 [cs.IR])
    Recently, we have witnessed generative retrieval increasingly gaining attention in the information retrieval (IR) field, which retrieves documents by directly generating their identifiers. So far, much effort has been devoted to developing effective generative retrieval models. There has been less attention paid to the robustness perspective. When a new retrieval paradigm enters into the real-world application, it is also critical to measure the out-of-distribution (OOD) generalization, i.e., how would generative retrieval models generalize to new distributions. To answer this question, firstly, we define OOD robustness from three perspectives in retrieval problems: 1) The query variations; 2) The unforeseen query types; and 3) The unforeseen tasks. Based on this taxonomy, we conduct empirical studies to analyze the OOD robustness of several representative generative retrieval models against dense retrieval models. The empirical results indicate that the OOD robustness of generative retrieval models requires enhancement. We hope studying the OOD robustness of generative retrieval models would be advantageous to the IR community.
    Density Uncertainty Layers for Reliable Uncertainty Estimation. (arXiv:2306.12497v1 [cs.LG])
    Assessing the predictive uncertainty of deep neural networks is crucial for safety-related applications of deep learning. Although Bayesian deep learning offers a principled framework for estimating model uncertainty, the approaches that are commonly used to approximate the posterior often fail to deliver reliable estimates of predictive uncertainty. In this paper we propose a novel criterion for predictive uncertainty, that a model's predictive variance should be grounded in the empirical density of the input. It should produce higher uncertainty for inputs that are improbable in the training data and lower uncertainty for those inputs that are more probable. To operationalize this criterion, we develop the density uncertainty layer, an architectural element for a stochastic neural network that guarantees that the density uncertain criterion is satisfied. We study neural networks with density uncertainty layers on the CIFAR-10 and CIFAR-100 uncertainty benchmarks. Compared to existing approaches, we find that density uncertainty layers provide reliable uncertainty estimates and robust out-of-distribution detection performance.
    Factors Affecting the Performance of Automated Speaker Verification in Alzheimer's Disease Clinical Trials. (arXiv:2306.12444v1 [eess.AS])
    Detecting duplicate patient participation in clinical trials is a major challenge because repeated patients can undermine the credibility and accuracy of the trial's findings and result in significant health and financial risks. Developing accurate automated speaker verification (ASV) models is crucial to verify the identity of enrolled individuals and remove duplicates, but the size and quality of data influence ASV performance. However, there has been limited investigation into the factors that can affect ASV capabilities in clinical environments. In this paper, we bridge the gap by conducting analysis of how participant demographic characteristics, audio quality criteria, and severity level of Alzheimer's disease (AD) impact the performance of ASV utilizing a dataset of speech recordings from 659 participants with varying levels of AD, obtained through multiple speech tasks. Our results indicate that ASV performance: 1) is slightly better on male speakers than on female speakers; 2) degrades for individuals who are above 70 years old; 3) is comparatively better for non-native English speakers than for native English speakers; 4) is negatively affected by clinician interference, noisy background, and unclear participant speech; 5) tends to decrease with an increase in the severity level of AD. Our study finds that voice biometrics raise fairness concerns as certain subgroups exhibit different ASV performances owing to their inherent voice characteristics. Moreover, the performance of ASV is influenced by the quality of speech recordings, which underscores the importance of improving the data collection settings in clinical trials.
    Robust Statistical Comparison of Random Variables with Locally Varying Scale of Measurement. (arXiv:2306.12803v1 [stat.ML])
    Spaces with locally varying scale of measurement, like multidimensional structures with differently scaled dimensions, are pretty common in statistics and machine learning. Nevertheless, it is still understood as an open question how to exploit the entire information encoded in them properly. We address this problem by considering an order based on (sets of) expectations of random variables mapping into such non-standard spaces. This order contains stochastic dominance and expectation order as extreme cases when no, or respectively perfect, cardinal structure is given. We derive a (regularized) statistical test for our proposed generalized stochastic dominance (GSD) order, operationalize it by linear optimization, and robustify it by imprecise probability models. Our findings are illustrated with data from multidimensional poverty measurement, finance, and medicine.
    Knowledge Distillation via Token-level Relationship Graph. (arXiv:2306.12442v1 [cs.LG])
    Knowledge distillation is a powerful technique for transferring knowledge from a pre-trained teacher model to a student model. However, the true potential of knowledge transfer has not been fully explored. Existing approaches primarily focus on distilling individual information or instance-level relationships, overlooking the valuable information embedded in token-level relationships, which may be particularly affected by the long-tail effects. To address the above limitations, we propose a novel method called Knowledge Distillation with Token-level Relationship Graph (TRG) that leverages the token-wise relational knowledge to enhance the performance of knowledge distillation. By employing TRG, the student model can effectively emulate higher-level semantic information from the teacher model, resulting in improved distillation results. To further enhance the learning process, we introduce a token-wise contextual loss called contextual loss, which encourages the student model to capture the inner-instance semantic contextual of the teacher model. We conduct experiments to evaluate the effectiveness of the proposed method against several state-of-the-art approaches. Empirical results demonstrate the superiority of TRG across various visual classification tasks, including those involving imbalanced data. Our method consistently outperforms the existing baselines, establishing a new state-of-the-art performance in the field of knowledge distillation.
    Deep Dynamic Epidemiological Modelling for COVID-19 Forecasting in Multi-level Districts. (arXiv:2306.12457v1 [cs.LG])
    Objective: COVID-19 has spread worldwide and made a huge influence across the world. Modeling the infectious spread situation of COVID-19 is essential to understand the current condition and to formulate intervention measurements. Epidemiological equations based on the SEIR model simulate disease development. The traditional parameter estimation method to solve SEIR equations could not precisely fit real-world data due to different situations, such as social distancing policies and intervention strategies. Additionally, learning-based models achieve outstanding fitting performance, but cannot visualize mechanisms. Methods: Thus, we propose a deep dynamic epidemiological (DDE) method that combines epidemiological equations and deep-learning advantages to obtain high accuracy and visualization. The DDE contains deep networks to fit the effect function to simulate the ever-changing situations based on the neural ODE method in solving variants' equations, ensuring the fitting performance of multi-level areas. Results: We introduce four SEIR variants to fit different situations in different countries and regions. We compare our DDE method with traditional parameter estimation methods (Nelder-Mead, BFGS, Powell, Truncated Newton Conjugate-Gradient, Neural ODE) in fitting the real-world data in the cases of countries (the USA, Columbia, South Africa) and regions (Wuhan in China, Piedmont in Italy). Our DDE method achieves the best Mean Square Error and Pearson coefficient in all five areas. Further, compared with the state-of-art learning-based approaches, the DDE outperforms all techniques, including LSTM, RNN, GRU, Random Forest, Extremely Random Trees, and Decision Tree. Conclusion: DDE presents outstanding predictive ability and visualized display of the changes in infection rates in different regions and countries.
    On Addressing the Limitations of Graph Neural Networks. (arXiv:2306.12640v1 [cs.LG])
    This report gives a summary of two problems about graph convolutional networks (GCNs): over-smoothing and heterophily challenges, and outlines future directions to explore.
    Comparative Analysis of Segment Anything Model and U-Net for Breast Tumor Detection in Ultrasound and Mammography Images. (arXiv:2306.12510v1 [eess.IV])
    In this study, the main objective is to develop an algorithm capable of identifying and delineating tumor regions in breast ultrasound (BUS) and mammographic images. The technique employs two advanced deep learning architectures, namely U-Net and pretrained SAM, for tumor segmentation. The U-Net model is specifically designed for medical image segmentation and leverages its deep convolutional neural network framework to extract meaningful features from input images. On the other hand, the pretrained SAM architecture incorporates a mechanism to capture spatial dependencies and generate segmentation results. Evaluation is conducted on a diverse dataset containing annotated tumor regions in BUS and mammographic images, covering both benign and malignant tumors. This dataset enables a comprehensive assessment of the algorithm's performance across different tumor types. Results demonstrate that the U-Net model outperforms the pretrained SAM architecture in accurately identifying and segmenting tumor regions in both BUS and mammographic images. The U-Net exhibits superior performance in challenging cases involving irregular shapes, indistinct boundaries, and high tumor heterogeneity. In contrast, the pretrained SAM architecture exhibits limitations in accurately identifying tumor areas, particularly for malignant tumors and objects with weak boundaries or complex shapes. These findings highlight the importance of selecting appropriate deep learning architectures tailored for medical image segmentation. The U-Net model showcases its potential as a robust and accurate tool for tumor detection, while the pretrained SAM architecture suggests the need for further improvements to enhance segmentation performance.
    State-wise Constrained Policy Optimization. (arXiv:2306.12594v1 [cs.LG])
    Reinforcement Learning (RL) algorithms have shown tremendous success in simulation environments, but their application to real-world problems faces significant challenges, with safety being a major concern. In particular, enforcing state-wise constraints is essential for many challenging tasks such as autonomous driving and robot manipulation. However, existing safe RL algorithms under the framework of Constrained Markov Decision Process (CMDP) do not consider state-wise constraints. To address this gap, we propose State-wise Constrained Policy Optimization (SCPO), the first general-purpose policy search algorithm for state-wise constrained reinforcement learning. SCPO provides guarantees for state-wise constraint satisfaction in expectation. In particular, we introduce the framework of Maximum Markov Decision Process, and prove that the worst-case safety violation is bounded under SCPO. We demonstrate the effectiveness of our approach on training neural network policies for extensive robot locomotion tasks, where the agent must satisfy a variety of state-wise safety constraints. Our results show that SCPO significantly outperforms existing methods and can handle state-wise constraints in high-dimensional robotics tasks.
    Don't be so Monotone: Relaxing Stochastic Line Search in Over-Parameterized Models. (arXiv:2306.12747v1 [math.OC])
    Recent works have shown that line search methods can speed up Stochastic Gradient Descent (SGD) and Adam in modern over-parameterized settings. However, existing line searches may take steps that are smaller than necessary since they require a monotone decrease of the (mini-)batch objective function. We explore nonmonotone line search methods to relax this condition and possibly accept larger step sizes. Despite the lack of a monotonic decrease, we prove the same fast rates of convergence as in the monotone case. Our experiments show that nonmonotone methods improve the speed of convergence and generalization properties of SGD/Adam even beyond the previous monotone line searches. We propose a POlyak NOnmonotone Stochastic (PoNoS) method, obtained by combining a nonmonotone line search with a Polyak initial step size. Furthermore, we develop a new resetting technique that in the majority of the iterations reduces the amount of backtracks to zero while still maintaining a large initial step size. To the best of our knowledge, a first runtime comparison shows that the epoch-wise advantage of line-search-based methods gets reflected in the overall computational time.
    Can Differentiable Decision Trees Learn Interpretable Reward Functions?. (arXiv:2306.13004v1 [cs.LG])
    There is an increasing interest in learning reward functions that model human intent and human preferences. However, many frameworks use blackbox learning methods that, while expressive, are difficult to interpret. We propose and evaluate a novel approach for learning expressive and interpretable reward functions from preferences using Differentiable Decision Trees (DDTs) for both low- and high-dimensional state inputs. We explore and discuss the viability of learning interpretable reward functions using DDTs by evaluating our algorithm on Cartpole, Visual Gridworld environments, and Atari games. We provide evidence that that the tree structure of our learned reward function is useful in determining the extent to which a reward function is aligned with human preferences. We visualize the learned reward DDTs and find that they are capable of learning interpretable reward functions but that the discrete nature of the trees hurts the performance of reinforcement learning at test time. However, we also show evidence that using soft outputs (averaged over all leaf nodes) results in competitive performance when compared with larger capacity deep neural network reward functions.
    CosmoPower-JAX: high-dimensional Bayesian inference with differentiable cosmological emulators. (arXiv:2305.06347v2 [astro-ph.CO] UPDATED)
    We present CosmoPower-JAX, a JAX-based implementation of the CosmoPower framework, which accelerates cosmological inference by building neural emulators of cosmological power spectra. We show how, using the automatic differentiation, batch evaluation and just-in-time compilation features of JAX, and running the inference pipeline on graphics processing units (GPUs), parameter estimation can be accelerated by orders of magnitude with advanced gradient-based sampling techniques. These can be used to efficiently explore high-dimensional parameter spaces, such as those needed for the analysis of next-generation cosmological surveys. We showcase the accuracy and computational efficiency of CosmoPower-JAX on two simulated Stage IV configurations. We first consider a single survey performing a cosmic shear analysis totalling 37 model parameters. We validate the contours derived with CosmoPower-JAX and a Hamiltonian Monte Carlo sampler against those derived with a nested sampler and without emulators, obtaining a speed-up factor of $\mathcal{O}(10^3)$. We then consider a combination of three Stage IV surveys, each performing a joint cosmic shear and galaxy clustering (3x2pt) analysis, for a total of 157 model parameters. Even with such a high-dimensional parameter space, CosmoPower-JAX provides converged posterior contours in 3 days, as opposed to the estimated 6 years required by standard methods. CosmoPower-JAX is fully written in Python, and we make it publicly available to help the cosmological community meet the accuracy requirements set by next-generation surveys.
    Improving Proactive Dialog Agents Using Socially-Aware Reinforcement Learning. (arXiv:2211.15359v2 [cs.CL] UPDATED)
    The next step for intelligent dialog agents is to escape their role as silent bystanders and become proactive. Well-defined proactive behavior may improve human-machine cooperation, as the agent takes a more active role during interaction and takes off responsibility from the user. However, proactivity is a double-edged sword because poorly executed pre-emptive actions may have a devastating effect not only on the task outcome but also on the relationship with the user. For designing adequate proactive dialog strategies, we propose a novel approach including both social as well as task-relevant features in the dialog. Here, the primary goal is to optimize proactive behavior so that it is task-oriented - this implies high task success and efficiency - while also being socially effective by fostering user trust. Including both aspects in the reward function for training a proactive dialog agent using reinforcement learning showed the benefit of our approach for more successful human-machine cooperation.
    Explore, Establish, Exploit: Red Teaming Language Models from Scratch. (arXiv:2306.09442v2 [cs.CL] UPDATED)
    Deploying Large language models (LLMs) can pose hazards from harmful outputs such as toxic or dishonest speech. Prior work has introduced tools that elicit harmful outputs in order to identify and mitigate these risks. While this is a valuable step toward securing language models, these approaches typically rely on a pre-existing classifier for undesired outputs. This limits their application to situations where the type of harmful behavior is known with precision beforehand. However, this skips a central challenge of red teaming: developing a contextual understanding of the behaviors that a model can exhibit. Furthermore, when such a classifier already exists, red teaming has limited marginal value because the classifier could simply be used to filter training data or model outputs. In this work, we consider red teaming under the assumption that the adversary is working from a high-level, abstract specification of undesired behavior. The red team is expected to refine/extend this specification and identify methods to elicit this behavior from the model. Our red teaming framework consists of three steps: 1) Exploring the model's behavior in the desired context; 2) Establishing a measurement of undesired behavior (e.g., a classifier trained to reflect human evaluations); and 3) Exploiting the model's flaws using this measure and an established red teaming methodology. We apply this approach to red team GPT-2 and GPT-3 models to systematically discover classes of prompts that elicit toxic and dishonest statements. In doing so, we also construct and release the CommonClaim dataset of 20,000 statements that have been labeled by human subjects as common-knowledge-true, common-knowledge-false, or neither. Code is available at https://github.com/thestephencasper/explore_establish_exploit_llms. CommonClaim is available at https://github.com/Algorithmic-Alignment-Lab/CommonClaim.
    Graph Pooling for Graph Neural Networks: Progress, Challenges, and Opportunities. (arXiv:2204.07321v2 [cs.LG] UPDATED)
    Graph neural networks have emerged as a leading architecture for many graph-level tasks, such as graph classification and graph generation. As an essential component of the architecture, graph pooling is indispensable for obtaining a holistic graph-level representation of the whole graph. Although a great variety of methods have been proposed in this promising and fast-developing research field, to the best of our knowledge, little effort has been made to systematically summarize these works. To set the stage for the development of future works, in this paper, we attempt to fill this gap by providing a broad review of recent methods for graph pooling. Specifically, 1) we first propose a taxonomy of existing graph pooling methods with a mathematical summary for each category; 2) then, we provide an overview of the libraries related to graph pooling, including the commonly used datasets, model architectures for downstream tasks, and open-source implementations; 3) next, we further outline the applications that incorporate the idea of graph pooling in a variety of domains; 4) finally, we discuss certain critical challenges facing current studies and share our insights on future potential directions for research on the improvement of graph pooling.
    On-line reinforcement learning for optimization of real-life energy trading strategy. (arXiv:2303.16266v2 [cs.LG] UPDATED)
    An increasing share of energy is produced from renewable sources by many small producers. The efficiency of those sources is volatile and, to some extent, random, exacerbating the problem of energy market balancing. In many countries, this balancing is done on the day-ahead (DA) energy markets. This paper considers automated trading on the DA energy market by a medium size prosumer. We model this activity as a Markov Decision Process and formalize a framework in which an applicable in real-life strategy can be optimized with off-line data. We design a trading strategy that is fed with the available environmental information that can impact future prices, including weather forecasts. We use state-of-the-art reinforcement learning (RL) algorithms to optimize this strategy. For comparison, we also synthesize a simple parametric trading strategy and optimize it with an evolutionary algorithm. Results show that our RL-based strategy generates the highest market profits.
    Online Self-Supervised Learning in Machine Learning Intrusion Detection for the Internet of Things. (arXiv:2306.13030v1 [cs.CR])
    This paper proposes a novel Self-Supervised Intrusion Detection (SSID) framework, which enables a fully online Machine Learning (ML) based Intrusion Detection System (IDS) that requires no human intervention or prior off-line learning. The proposed framework analyzes and labels incoming traffic packets based only on the decisions of the IDS itself using an Auto-Associative Deep Random Neural Network, and on an online estimate of its statistically measured trustworthiness. The SSID framework enables IDS to adapt rapidly to time-varying characteristics of the network traffic, and eliminates the need for offline data collection. This approach avoids human errors in data labeling, and human labor and computational costs of model training and data collection. The approach is experimentally evaluated on public datasets and compared with well-known ML models, showing that this SSID framework is very useful and advantageous as an accurate and online learning ML-based IDS for IoT systems.
    Learning topological defects formation with neural networks in a quantum phase transition. (arXiv:2204.06769v2 [cond-mat.dis-nn] CROSS LISTED)
    Neural networks possess formidable representational power, rendering them invaluable in solving complex quantum many-body systems. While they excel at analyzing static solutions, nonequilibrium processes, including critical dynamics during a quantum phase transition, pose a greater challenge for neural networks. To address this, we utilize neural networks and machine learning algorithms to investigate the time evolutions, universal statistics, and correlations of topological defects in a one-dimensional transverse-field quantum Ising model. Specifically, our analysis involves computing the energy of the system during a quantum phase transition following a linear quench of the transverse magnetic field strength. The excitation energies satisfy a power-law relation to the quench rate, indicating a proportional relationship between the excitation energy and the kink numbers. Moreover, we establish a universal power-law relationship between the first three cumulants of the kink numbers and the quench rate, indicating a binomial distribution of the kinks. Finally, the normalized kink-kink correlations are also investigated and it is found that the numerical values are consistent with the analytic formula.
    Finding neural signatures for obesity through feature selection on source-localized EEG. (arXiv:2208.14007v3 [cs.LG] UPDATED)
    Obesity is a serious issue in the modern society and is often associated to significantly reduced quality of life. Current research conducted to explore obesity-related neurological evidences using electroencephalography (EEG) data are limited to traditional approaches. In this study, we developed a novel machine learning model to identify brain networks of obese females using alpha band functional connectivity features derived from EEG data. An overall classification accuracy of 0.937 is achieved. Our finding suggests that the obese brain is characterized by a dysfunctional network in which the areas that responsible for processing self-referential information and environmental context information are impaired.
    FDINet: Protecting against DNN Model Extraction via Feature Distortion Index. (arXiv:2306.11338v2 [cs.CR] UPDATED)
    Machine Learning as a Service (MLaaS) platforms have gained popularity due to their accessibility, cost-efficiency, scalability, and rapid development capabilities. However, recent research has highlighted the vulnerability of cloud-based models in MLaaS to model extraction attacks. In this paper, we introduce FDINET, a novel defense mechanism that leverages the feature distribution of deep neural network (DNN) models. Concretely, by analyzing the feature distribution from the adversary's queries, we reveal that the feature distribution of these queries deviates from that of the model's training set. Based on this key observation, we propose Feature Distortion Index (FDI), a metric designed to quantitatively measure the feature distribution deviation of received queries. The proposed FDINET utilizes FDI to train a binary detector and exploits FDI similarity to identify colluding adversaries from distributed extraction attacks. We conduct extensive experiments to evaluate FDINET against six state-of-the-art extraction attacks on four benchmark datasets and four popular model architectures. Empirical results demonstrate the following findings FDINET proves to be highly effective in detecting model extraction, achieving a 100% detection accuracy on DFME and DaST. FDINET is highly efficient, using just 50 queries to raise an extraction alarm with an average confidence of 96.08% for GTSRB. FDINET exhibits the capability to identify colluding adversaries with an accuracy exceeding 91%. Additionally, it demonstrates the ability to detect two types of adaptive attacks.
    Semi-Supervised Offline Reinforcement Learning with Action-Free Trajectories. (arXiv:2210.06518v3 [cs.LG] UPDATED)
    Natural agents can effectively learn from multiple data sources that differ in size, quality, and types of measurements. We study this heterogeneity in the context of offline reinforcement learning (RL) by introducing a new, practically motivated semi-supervised setting. Here, an agent has access to two sets of trajectories: labelled trajectories containing state, action and reward triplets at every timestep, along with unlabelled trajectories that contain only state and reward information. For this setting, we develop and study a simple meta-algorithmic pipeline that learns an inverse dynamics model on the labelled data to obtain proxy-labels for the unlabelled data, followed by the use of any offline RL algorithm on the true and proxy-labelled trajectories. Empirically, we find this simple pipeline to be highly successful -- on several D4RL benchmarks~\cite{fu2020d4rl}, certain offline RL algorithms can match the performance of variants trained on a fully labelled dataset even when we label only 10\% of trajectories which are highly suboptimal. To strengthen our understanding, we perform a large-scale controlled empirical study investigating the interplay of data-centric properties of the labelled and unlabelled datasets, with algorithmic design choices (e.g., choice of inverse dynamics, offline RL algorithm) to identify general trends and best practices for training RL agents on semi-supervised offline datasets.
    CounterNet: End-to-End Training of Prediction Aware Counterfactual Explanations. (arXiv:2109.07557v3 [cs.LG] UPDATED)
    This work presents CounterNet, a novel end-to-end learning framework which integrates Machine Learning (ML) model training and the generation of corresponding counterfactual (CF) explanations into a single end-to-end pipeline. Counterfactual explanations offer a contrastive case, i.e., they attempt to find the smallest modification to the feature values of an instance that changes the prediction of the ML model on that instance to a predefined output. Prior techniques for generating CF explanations suffer from two major limitations: (i) all of them are post-hoc methods designed for use with proprietary ML models -- as a result, their procedure for generating CF explanations is uninformed by the training of the ML model, which leads to misalignment between model predictions and explanations; and (ii) most of them rely on solving separate time-intensive optimization problems to find CF explanations for each input data point (which negatively impacts their runtime). This work makes a novel departure from the prevalent post-hoc paradigm (of generating CF explanations) by presenting CounterNet, an end-to-end learning framework which integrates predictive model training and the generation of counterfactual (CF) explanations into a single pipeline. Unlike post-hoc methods, CounterNet enables the optimization of the CF explanation generation only once together with the predictive model. We adopt a block-wise coordinate descent procedure which helps in effectively training CounterNet's network. Our extensive experiments on multiple real-world datasets show that CounterNet generates high-quality predictions, and consistently achieves 100% CF validity and low proximity scores (thereby achieving a well-balanced cost-invalidity trade-off) for any new input instance, and runs 3X faster than existing state-of-the-art baselines.
    Reinforcement Federated Learning Method Based on Adaptive OPTICS Clustering. (arXiv:2306.12859v1 [cs.LG])
    Federated learning is a distributed machine learning technology, which realizes the balance between data privacy protection and data sharing computing. To protect data privacy, feder-ated learning learns shared models by locally executing distributed training on participating devices and aggregating local models into global models. There is a problem in federated learning, that is, the negative impact caused by the non-independent and identical distribu-tion of data across different user terminals. In order to alleviate this problem, this paper pro-poses a strengthened federation aggregation method based on adaptive OPTICS clustering. Specifically, this method perceives the clustering environment as a Markov decision process, and models the adjustment process of parameter search direction, so as to find the best clus-tering parameters to achieve the best federated aggregation method. The core contribution of this paper is to propose an adaptive OPTICS clustering algorithm for federated learning. The algorithm combines OPTICS clustering and adaptive learning technology, and can effective-ly deal with the problem of non-independent and identically distributed data across different user terminals. By perceiving the clustering environment as a Markov decision process, the goal is to find the best parameters of the OPTICS cluster without artificial assistance, so as to obtain the best federated aggregation method and achieve better performance. The reliability and practicability of this method have been verified on the experimental data, and its effec-tiveness and superiority have been proved.
    Stock Price Prediction using Dynamic Neural Networks. (arXiv:2306.12969v1 [q-fin.ST])
    This paper will analyze and implement a time series dynamic neural network to predict daily closing stock prices. Neural networks possess unsurpassed abilities in identifying underlying patterns in chaotic, non-linear, and seemingly random data, thus providing a mechanism to predict stock price movements much more precisely than many current techniques. Contemporary methods for stock analysis, including fundamental, technical, and regression techniques, are conversed and paralleled with the performance of neural networks. Also, the Efficient Market Hypothesis (EMH) is presented and contrasted with Chaos theory using neural networks. This paper will refute the EMH and support Chaos theory. Finally, recommendations for using neural networks in stock price prediction will be presented.
    TSMixer: An all-MLP Architecture for Time Series Forecasting. (arXiv:2303.06053v3 [cs.LG] UPDATED)
    Real-world time-series datasets are often multivariate with complex dynamics. To capture this complexity, high capacity architectures like recurrent- or attention-based sequential deep learning models have become popular. However, recent work demonstrates that simple univariate linear models can outperform such deep learning models on several commonly used academic benchmarks. Extending them, in this paper, we investigate the capabilities of linear models for time-series forecasting and present Time-Series Mixer (TSMixer), a novel architecture designed by stacking multi-layer perceptrons (MLPs). TSMixer is based on mixing operations along both the time and feature dimensions to extract information efficiently. On popular academic benchmarks, the simple-to-implement TSMixer is comparable to specialized state-of-the-art models that leverage the inductive biases of specific benchmarks. On the challenging and large scale M5 benchmark, a real-world retail dataset, TSMixer demonstrates superior performance compared to the state-of-the-art alternatives. Our results underline the importance of efficiently utilizing cross-variate and auxiliary information for improving the performance of time series forecasting. We present various analyses to shed light into the capabilities of TSMixer. The design paradigms utilized in TSMixer are expected to open new horizons for deep learning-based time series forecasting. The implementation is available at https://github.com/google-research/google-research/tree/master/tsmixer
    FFCV: Accelerating Training by Removing Data Bottlenecks. (arXiv:2306.12517v1 [cs.LG])
    We present FFCV, a library for easy and fast machine learning model training. FFCV speeds up model training by eliminating (often subtle) data bottlenecks from the training process. In particular, we combine techniques such as an efficient file storage format, caching, data pre-loading, asynchronous data transfer, and just-in-time compilation to (a) make data loading and transfer significantly more efficient, ensuring that GPUs can reach full utilization; and (b) offload as much data processing as possible to the CPU asynchronously, freeing GPU cycles for training. Using FFCV, we train ResNet-18 and ResNet-50 on the ImageNet dataset with competitive tradeoff between accuracy and training time. For example, we are able to train an ImageNet ResNet-50 model to 75\% in only 20 mins on a single machine. We demonstrate FFCV's performance, ease-of-use, extensibility, and ability to adapt to resource constraints through several case studies. Detailed installation instructions, documentation, and Slack support channel are available at https://ffcv.io/ .
    Improving Long-Horizon Imitation Through Instruction Prediction. (arXiv:2306.12554v1 [cs.LG])
    Complex, long-horizon planning and its combinatorial nature pose steep challenges for learning-based agents. Difficulties in such settings are exacerbated in low data regimes where over-fitting stifles generalization and compounding errors hurt accuracy. In this work, we explore the use of an often unused source of auxiliary supervision: language. Inspired by recent advances in transformer-based models, we train agents with an instruction prediction loss that encourages learning temporally extended representations that operate at a high level of abstraction. Concretely, we demonstrate that instruction modeling significantly improves performance in planning environments when training with a limited number of demonstrations on the BabyAI and Crafter benchmarks. In further analysis we find that instruction modeling is most important for tasks that require complex reasoning, while understandably offering smaller gains in environments that require simple plans. More details and code can be found at https://github.com/jhejna/instruction-prediction.
    Scrutinizing XAI using linear ground-truth data with suppressor variables. (arXiv:2111.07473v2 [stat.ML] UPDATED)
    Machine learning (ML) is increasingly often used to inform high-stakes decisions. As complex ML models (e.g., deep neural networks) are often considered black boxes, a wealth of procedures has been developed to shed light on their inner workings and the ways in which their predictions come about, defining the field of 'explainable AI' (XAI). Saliency methods rank input features according to some measure of 'importance'. Such methods are difficult to validate since a formal definition of feature importance is, thus far, lacking. It has been demonstrated that some saliency methods can highlight features that have no statistical association with the prediction target (suppressor variables). To avoid misinterpretations due to such behavior, we propose the actual presence of such an association as a necessary condition and objective preliminary definition for feature importance. We carefully crafted a ground-truth dataset in which all statistical dependencies are well-defined and linear, serving as a benchmark to study the problem of suppressor variables. We evaluate common explanation methods including LRP, DTD, PatternNet, PatternAttribution, LIME, Anchors, SHAP, and permutation-based methods with respect to our objective definition. We show that most of these methods are unable to distinguish important features from suppressors in this setting.
    OptIForest: Optimal Isolation Forest for Anomaly Detection. (arXiv:2306.12703v1 [cs.LG])
    Anomaly detection plays an increasingly important role in various fields for critical tasks such as intrusion detection in cybersecurity, financial risk detection, and human health monitoring. A variety of anomaly detection methods have been proposed, and a category based on the isolation forest mechanism stands out due to its simplicity, effectiveness, and efficiency, e.g., iForest is often employed as a state-of-the-art detector for real deployment. While the majority of isolation forests use the binary structure, a framework LSHiForest has demonstrated that the multi-fork isolation tree structure can lead to better detection performance. However, there is no theoretical work answering the fundamentally and practically important question on the optimal tree structure for an isolation forest with respect to the branching factor. In this paper, we establish a theory on isolation efficiency to answer the question and determine the optimal branching factor for an isolation tree. Based on the theoretical underpinning, we design a practical optimal isolation forest OptIForest incorporating clustering based learning to hash which enables more information to be learned from data for better isolation quality. The rationale of our approach relies on a better bias-variance trade-off achieved by bias reduction in OptIForest. Extensive experiments on a series of benchmarking datasets for comparative and ablation studies demonstrate that our approach can efficiently and robustly achieve better detection performance in general than the state-of-the-arts including the deep learning based methods.
    Conditional Generators for Limit Order Book Environments: Explainability, Challenges, and Robustness. (arXiv:2306.12806v1 [q-fin.TR])
    Limit order books are a fundamental and widespread market mechanism. This paper investigates the use of conditional generative models for order book simulation. For developing a trading agent, this approach has drawn recent attention as an alternative to traditional backtesting due to its ability to react to the presence of the trading agent. Using a state-of-the-art CGAN (from Coletta et al. (2022)), we explore its dependence upon input features, which highlights both strengths and weaknesses. To do this, we use "adversarial attacks" on the model's features and its mechanism. We then show how these insights can be used to improve the CGAN, both in terms of its realism and robustness. We finish by laying out a roadmap for future work.
    Deep Language Networks: Joint Prompt Training of Stacked LLMs using Variational Inference. (arXiv:2306.12509v1 [cs.CL])
    We view large language models (LLMs) as stochastic \emph{language layers} in a network, where the learnable parameters are the natural language \emph{prompts} at each layer. We stack two such layers, feeding the output of one layer to the next. We call the stacked architecture a \emph{Deep Language Network} (DLN). We first show how to effectively perform prompt optimization for a 1-Layer language network (DLN-1). We then show how to train 2-layer DLNs (DLN-2), where two prompts must be learnt. We consider the output of the first layer as a latent variable to marginalize, and devise a variational inference algorithm for joint prompt training. A DLN-2 reaches higher performance than a single layer, sometimes comparable to few-shot GPT-4 even when each LLM in the network is smaller and less powerful. The DLN code is open source: https://github.com/microsoft/deep-language-networks .
    A Systematic Survey in Geometric Deep Learning for Structure-based Drug Design. (arXiv:2306.11768v2 [q-bio.QM] UPDATED)
    Structure-based drug design (SBDD), which utilizes the three-dimensional geometry of proteins to identify potential drug candidates, is becoming increasingly vital in drug discovery. However, traditional methods based on physiochemical modeling and experts' domain knowledge are time-consuming and laborious. The recent advancements in geometric deep learning, which integrates and processes 3D geometric data, coupled with the availability of accurate protein 3D structure predictions from tools like AlphaFold, have significantly propelled progress in structure-based drug design. In this paper, we systematically review the recent progress of geometric deep learning for structure-based drug design. We start with a brief discussion of the mainstream tasks in structure-based drug design, commonly used 3D protein representations and representative predictive/generative models. Then we delve into detailed reviews for each task (binding site prediction, binding pose generation, \emph{de novo} molecule generation, linker design, and binding affinity prediction), including the problem setup, representative methods, datasets, and evaluation metrics. Finally, we conclude this survey with the current challenges and highlight potential opportunities of geometric deep learning for structure-based drug design.
    Targeted collapse regularized autoencoder for anomaly detection: black hole at the center. (arXiv:2306.12627v1 [cs.LG])
    Autoencoders have been extensively used in the development of recent anomaly detection techniques. The premise of their application is based on the notion that after training the autoencoder on normal training data, anomalous inputs will exhibit a significant reconstruction error. Consequently, this enables a clear differentiation between normal and anomalous samples. In practice, however, it is observed that autoencoders can generalize beyond the normal class and achieve a small reconstruction error on some of the anomalous samples. To improve the performance, various techniques propose additional components and more sophisticated training procedures. In this work, we propose a remarkably straightforward alternative: instead of adding neural network components, involved computations, and cumbersome training, we complement the reconstruction loss with a computationally light term that regulates the norm of representations in the latent space. The simplicity of our approach minimizes the requirement for hyperparameter tuning and customization for new applications which, paired with its permissive data modality constraint, enhances the potential for successful adoption across a broad range of applications. We test the method on various visual and tabular benchmarks and demonstrate that the technique matches and frequently outperforms alternatives. We also provide a theoretical analysis and numerical simulations that help demonstrate the underlying process that unfolds during training and how it can help with anomaly detection. This mitigates the black-box nature of autoencoder-based anomaly detection algorithms and offers an avenue for further investigation of advantages, fail cases, and potential new directions.
  • Open

    Sharp analysis of EM for learning mixtures of pairwise differences. (arXiv:2302.10066v2 [math.ST] UPDATED)
    We consider a symmetric mixture of linear regressions with random samples from the pairwise comparison design, which can be seen as a noisy version of a type of Euclidean distance geometry problem. We analyze the expectation-maximization (EM) algorithm locally around the ground truth and establish that the sequence converges linearly, providing an $\ell_\infty$-norm guarantee on the estimation error of the iterates. Furthermore, we show that the limit of the EM sequence achieves the sharp rate of estimation in the $\ell_2$-norm, matching the information-theoretically optimal constant. We also argue through simulation that convergence from a random initialization is much more delicate in this setting, and does not appear to occur in general. Our results show that the EM algorithm can exhibit several unique behaviors when the covariate distribution is suitably structured.
    Complex Preferences for Different Convergent Priors in Discrete Graph Diffusion. (arXiv:2306.02957v2 [cs.LG] UPDATED)
    Diffusion models have achieved state-of-the-art performance in generating many different kinds of data, including images, text, and videos. Despite their success, there has been limited research on how the underlying diffusion process and the final convergent prior can affect generative performance; this research has also been limited to continuous data types and a score-based diffusion framework. To fill this gap, we explore how different discrete diffusion kernels (which converge to different prior distributions) affect the performance of diffusion models for graphs. To this end, we developed a novel formulation of a family of discrete diffusion kernels which are easily adjustable to converge to different Bernoulli priors, and we study the effect of these different kernels on generative performance. We show that the quality of generated graphs is sensitive to the prior used, and that the optimal choice cannot be explained by obvious statistics or metrics, which challenges the intuitions which previous works have suggested.
    Sample Complexity for Quadratic Bandits: Hessian Dependent Bounds and Optimal Algorithms. (arXiv:2306.12383v2 [cs.LG] UPDATED)
    In stochastic zeroth-order optimization, a problem of practical relevance is understanding how to fully exploit the local geometry of the underlying objective function. We consider a fundamental setting in which the objective function is quadratic, and provide the first tight characterization of the optimal Hessian-dependent sample complexity. Our contribution is twofold. First, from an information-theoretic point of view, we prove tight lower bounds on Hessian-dependent complexities by introducing a concept called energy allocation, which captures the interaction between the searching algorithm and the geometry of objective functions. A matching upper bound is obtained by solving the optimal energy spectrum. Then, algorithmically, we show the existence of a Hessian-independent algorithm that universally achieves the asymptotic optimal sample complexities for all Hessian instances. The optimal sample complexities achieved by our algorithm remain valid for heavy-tailed noise distributions, which are enabled by a truncation method.
    Hierarchical Neural Simulation-Based Inference Over Event Ensembles. (arXiv:2306.12584v1 [stat.ML])
    When analyzing real-world data it is common to work with event ensembles, which comprise sets of observations that collectively constrain the parameters of an underlying model of interest. Such models often have a hierarchical structure, where "local" parameters impact individual events and "global" parameters influence the entire dataset. We introduce practical approaches for optimal dataset-wide probabilistic inference in cases where the likelihood is intractable, but simulations can be realized via forward modeling. We construct neural estimators for the likelihood(-ratio) or posterior and show that explicitly accounting for the model's hierarchical structure can lead to tighter parameter constraints. We ground our discussion using case studies from the physical sciences, focusing on examples from particle physics (particle collider data) and astrophysics (strong gravitational lensing observations).
    Fitted Value Iteration Methods for Bicausal Optimal Transport. (arXiv:2306.12658v1 [stat.ML])
    We develop a fitted value iteration (FVI) method to compute bicausal optimal transport (OT) where couplings have an adapted structure. Based on the dynamic programming formulation, FVI adopts a function class to approximate the value functions in bicausal OT. Under the concentrability condition and approximate completeness assumption, we prove the sample complexity using (local) Rademacher complexity. Furthermore, we demonstrate that multilayer neural networks with appropriate structures satisfy the crucial assumptions required in sample complexity proofs. Numerical experiments reveal that FVI outperforms linear programming and adapted Sinkhorn methods in scalability as the time horizon increases, while still maintaining acceptable accuracy.
    On the use of the Gram matrix for multivariate functional principal components analysis. (arXiv:2306.12949v1 [stat.ME])
    Dimension reduction is crucial in functional data analysis (FDA). The key tool to reduce the dimension of the data is functional principal component analysis. Existing approaches for functional principal component analysis usually involve the diagonalization of the covariance operator. With the increasing size and complexity of functional datasets, estimating the covariance operator has become more challenging. Therefore, there is a growing need for efficient methodologies to estimate the eigencomponents. Using the duality of the space of observations and the space of functional features, we propose to use the inner-product between the curves to estimate the eigenelements of multivariate and multidimensional functional datasets. The relationship between the eigenelements of the covariance operator and those of the inner-product matrix is established. We explore the application of these methodologies in several FDA settings and provide general guidance on their usability.
    Cryptocurrency Valuation: An Explainable AI Approach. (arXiv:2201.12893v7 [econ.GN] UPDATED)
    Currently, there are no convincing proxies for the fundamentals of cryptocurrency assets. We propose a new market-to-fundamental ratio, the price-to-utility (PU) ratio, utilizing unique blockchain accounting methods. We then proxy various existing fundamental-to-market ratios by Bitcoin historical data and find they have little predictive power for short-term bitcoin returns. However, PU ratio effectively predicts long-term bitcoin returns than alternative methods. Furthermore, we verify the explainability of PU ratio using machine learning. Finally, we present an automated trading strategy advised by the PU ratio that outperforms the conventional buy-and-hold and market-timing strategies. Our research contributes to explainable AI in finance from three facets: First, our market-to-fundamental ratio is based on classic monetary theory and the unique UTXO model of Bitcoin accounting rather than ad hoc; Second, the empirical evidence testifies the buy-low and sell-high implications of the ratio; Finally, we distribute the trading algorithms as open-source software via Python Package Index for future research, which is exceptional in finance research.
    Inferring the finest pattern of mutual independence from data. (arXiv:2306.12984v1 [stat.ML])
    For a random variable $X$, we are interested in the blind extraction of its finest mutual independence pattern $\mu ( X )$. We introduce a specific kind of independence that we call dichotomic. If $\Delta ( X )$ stands for the set of all patterns of dichotomic independence that hold for $X$, we show that $\mu ( X )$ can be obtained as the intersection of all elements of $\Delta ( X )$. We then propose a method to estimate $\Delta ( X )$ when the data are independent and identically (i.i.d.) realizations of a multivariate normal distribution. If $\hat{\Delta} ( X )$ is the estimated set of valid patterns of dichotomic independence, we estimate $\mu ( X )$ as the intersection of all patterns of $\hat{\Delta} ( X )$. The method is tested on simulated data, showing its advantages and limits. We also consider an application to a toy example as well as to experimental data.
    Structured Learning in Time-dependent Cox Models. (arXiv:2306.12528v1 [stat.ME])
    Cox models with time-dependent coefficients and covariates are widely used in survival analysis. In high-dimensional settings, sparse regularization techniques are employed for variable selection, but existing methods for time-dependent Cox models lack flexibility in enforcing specific sparsity patterns (i.e., covariate structures). We propose a flexible framework for variable selection in time-dependent Cox models, accommodating complex selection rules. Our method can adapt to arbitrary grouping structures, including interaction selection, temporal, spatial, tree, and directed acyclic graph structures. It achieves accurate estimation with low false alarm rates. We develop the sox package, implementing a network flow algorithm for efficiently solving models with complex covariate structures. Sox offers a user-friendly interface for specifying grouping structures and delivers fast computation. Through examples, including a case study on identifying predictors of time to all-cause death in atrial fibrillation patients, we demonstrate the practical application of our method with specific selection rules.
    Finite-time Lyapunov exponents of deep neural networks. (arXiv:2306.12548v1 [cond-mat.dis-nn])
    We compute how small input perturbations affect the output of deep neural networks, exploring an analogy between deep networks and dynamical systems, where the growth or decay of local perturbations is characterised by finite-time Lyapunov exponents. We show that the maximal exponent forms geometrical structures in input space, akin to coherent structures in dynamical systems. Ridges of large positive exponents divide input space into different regions that the network associates with different classes. These ridges visualise the geometry that deep networks construct in input space, shedding light on the fundamental mechanisms underlying their learning capabilities.  ( 2 min )
    Generalized Low-Rank Update: Model Parameter Bounds for Low-Rank Training Data Modifications. (arXiv:2306.12670v1 [stat.ML])
    In this study, we have developed an incremental machine learning (ML) method that efficiently obtains the optimal model when a small number of instances or features are added or removed. This problem holds practical importance in model selection, such as cross-validation (CV) and feature selection. Among the class of ML methods known as linear estimators, there exists an efficient model update framework called the low-rank update that can effectively handle changes in a small number of rows and columns within the data matrix. However, for ML methods beyond linear estimators, there is currently no comprehensive framework available to obtain knowledge about the updated solution within a specific computational complexity. In light of this, our study introduces a method called the Generalized Low-Rank Update (GLRU) which extends the low-rank update framework of linear estimators to ML methods formulated as a certain class of regularized empirical risk minimization, including commonly used methods such as SVM and logistic regression. The proposed GLRU method not only expands the range of its applicability but also provides information about the updated solutions with a computational complexity proportional to the amount of dataset changes. To demonstrate the effectiveness of the GLRU method, we conduct experiments showcasing its efficiency in performing cross-validation and feature selection compared to other baseline methods.  ( 2 min )
    Mitigating Discrimination in Insurance with Wasserstein Barycenters. (arXiv:2306.12912v1 [stat.ML])
    The insurance industry is heavily reliant on predictions of risks based on characteristics of potential customers. Although the use of said models is common, researchers have long pointed out that such practices perpetuate discrimination based on sensitive features such as gender or race. Given that such discrimination can often be attributed to historical data biases, an elimination or at least mitigation is desirable. With the shift from more traditional models to machine-learning based predictions, calls for greater mitigation have grown anew, as simply excluding sensitive variables in the pricing process can be shown to be ineffective. In this article, we first investigate why predictions are a necessity within the industry and why correcting biases is not as straightforward as simply identifying a sensitive variable. We then propose to ease the biases through the use of Wasserstein barycenters instead of simple scaling. To demonstrate the effects and effectiveness of the approach we employ it on real data and discuss its implications.  ( 2 min )
    Classification Tree Pruning Under Covariate Shift. (arXiv:2305.04335v2 [stat.ML] UPDATED)
    We consider the problem of \emph{pruning} a classification tree, that is, selecting a suitable subtree that balances bias and variance, in common situations with inhomogeneous training data. Namely, assuming access to mostly data from a distribution $P_{X, Y}$, but little data from a desired distribution $Q_{X, Y}$ with different $X$-marginals, we present the first efficient procedure for optimal pruning in such situations, when cross-validation and other penalized variants are grossly inadequate. Optimality is derived with respect to a notion of \emph{average discrepancy} $P_{X} \to Q_{X}$ (averaged over $X$ space) which significantly relaxes a recent notion -- termed \emph{transfer-exponent} -- shown to tightly capture the limits of classification under such a distribution shift. Our relaxed notion can be viewed as a measure of \emph{relative dimension} between distributions, as it relates to existing notions of information such as the Minkowski and Renyi dimensions.  ( 2 min )
    Robust Statistical Comparison of Random Variables with Locally Varying Scale of Measurement. (arXiv:2306.12803v1 [stat.ML])
    Spaces with locally varying scale of measurement, like multidimensional structures with differently scaled dimensions, are pretty common in statistics and machine learning. Nevertheless, it is still understood as an open question how to exploit the entire information encoded in them properly. We address this problem by considering an order based on (sets of) expectations of random variables mapping into such non-standard spaces. This order contains stochastic dominance and expectation order as extreme cases when no, or respectively perfect, cardinal structure is given. We derive a (regularized) statistical test for our proposed generalized stochastic dominance (GSD) order, operationalize it by linear optimization, and robustify it by imprecise probability models. Our findings are illustrated with data from multidimensional poverty measurement, finance, and medicine.  ( 2 min )
    A One-Sample Decentralized Proximal Algorithm for Non-Convex Stochastic Composite Optimization. (arXiv:2302.09766v2 [math.OC] UPDATED)
    We focus on decentralized stochastic non-convex optimization, where $n$ agents work together to optimize a composite objective function which is a sum of a smooth term and a non-smooth convex term. To solve this problem, we propose two single-time scale algorithms: Prox-DASA and Prox-DASA-GT. These algorithms can find $\epsilon$-stationary points in $\mathcal{O}(n^{-1}\epsilon^{-2})$ iterations using constant batch sizes (i.e., $\mathcal{O}(1)$). Unlike prior work, our algorithms achieve comparable complexity without requiring large batch sizes, more complex per-iteration operations (such as double loops), or stronger assumptions. Our theoretical findings are supported by extensive numerical experiments, which demonstrate the superiority of our algorithms over previous approaches. Our code is available at https://github.com/xuxingc/ProxDASA.  ( 2 min )
    Targeted collapse regularized autoencoder for anomaly detection: black hole at the center. (arXiv:2306.12627v1 [cs.LG])
    Autoencoders have been extensively used in the development of recent anomaly detection techniques. The premise of their application is based on the notion that after training the autoencoder on normal training data, anomalous inputs will exhibit a significant reconstruction error. Consequently, this enables a clear differentiation between normal and anomalous samples. In practice, however, it is observed that autoencoders can generalize beyond the normal class and achieve a small reconstruction error on some of the anomalous samples. To improve the performance, various techniques propose additional components and more sophisticated training procedures. In this work, we propose a remarkably straightforward alternative: instead of adding neural network components, involved computations, and cumbersome training, we complement the reconstruction loss with a computationally light term that regulates the norm of representations in the latent space. The simplicity of our approach minimizes the requirement for hyperparameter tuning and customization for new applications which, paired with its permissive data modality constraint, enhances the potential for successful adoption across a broad range of applications. We test the method on various visual and tabular benchmarks and demonstrate that the technique matches and frequently outperforms alternatives. We also provide a theoretical analysis and numerical simulations that help demonstrate the underlying process that unfolds during training and how it can help with anomaly detection. This mitigates the black-box nature of autoencoder-based anomaly detection algorithms and offers an avenue for further investigation of advantages, fail cases, and potential new directions.  ( 3 min )
    SQ Lower Bounds for Learning Bounded Covariance GMMs. (arXiv:2306.13057v1 [cs.LG])
    We study the complexity of learning mixtures of separated Gaussians with common unknown bounded covariance matrix. Specifically, we focus on learning Gaussian mixture models (GMMs) on $\mathbb{R}^d$ of the form $P= \sum_{i=1}^k w_i \mathcal{N}(\boldsymbol \mu_i,\mathbf \Sigma_i)$, where $\mathbf \Sigma_i = \mathbf \Sigma \preceq \mathbf I$ and $\min_{i \neq j} \| \boldsymbol \mu_i - \boldsymbol \mu_j\|_2 \geq k^\epsilon$ for some $\epsilon>0$. Known learning algorithms for this family of GMMs have complexity $(dk)^{O(1/\epsilon)}$. In this work, we prove that any Statistical Query (SQ) algorithm for this problem requires complexity at least $d^{\Omega(1/\epsilon)}$. In the special case where the separation is on the order of $k^{1/2}$, we additionally obtain fine-grained SQ lower bounds with the correct exponent. Our SQ lower bounds imply similar lower bounds for low-degree polynomial tests. Conceptually, our results provide evidence that known algorithms for this problem are nearly best possible.  ( 2 min )
    Scrutinizing XAI using linear ground-truth data with suppressor variables. (arXiv:2111.07473v2 [stat.ML] UPDATED)
    Machine learning (ML) is increasingly often used to inform high-stakes decisions. As complex ML models (e.g., deep neural networks) are often considered black boxes, a wealth of procedures has been developed to shed light on their inner workings and the ways in which their predictions come about, defining the field of 'explainable AI' (XAI). Saliency methods rank input features according to some measure of 'importance'. Such methods are difficult to validate since a formal definition of feature importance is, thus far, lacking. It has been demonstrated that some saliency methods can highlight features that have no statistical association with the prediction target (suppressor variables). To avoid misinterpretations due to such behavior, we propose the actual presence of such an association as a necessary condition and objective preliminary definition for feature importance. We carefully crafted a ground-truth dataset in which all statistical dependencies are well-defined and linear, serving as a benchmark to study the problem of suppressor variables. We evaluate common explanation methods including LRP, DTD, PatternNet, PatternAttribution, LIME, Anchors, SHAP, and permutation-based methods with respect to our objective definition. We show that most of these methods are unable to distinguish important features from suppressors in this setting.  ( 2 min )
    Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model. (arXiv:2306.12968v1 [cs.SI])
    We consider the problem of recovering hidden communities in the Labeled Stochastic Block Model (LSBM) with a finite number of clusters, where cluster sizes grow linearly with the total number $n$ of items. In the LSBM, a label is (independently) observed for each pair of items. Our objective is to devise an efficient algorithm that recovers clusters using the observed labels. To this end, we revisit instance-specific lower bounds on the expected number of misclassified items satisfied by any clustering algorithm. We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability. IAC consists of a one-time spectral clustering algorithm followed by an iterative likelihood-based cluster assignment improvement. This approach is based on the instance-specific lower bound and does not require any model parameters, including the number of clusters. By performing the spectral clustering only once, IAC maintains an overall computational complexity of $\mathcal{O}(n \text{polylog}(n))$. We illustrate the effectiveness of our approach through numerical experiments.  ( 2 min )
    Breaking the Curse of Multiagents in a Large State Space: RL in Markov Games with Independent Linear Function Approximation. (arXiv:2302.03673v3 [cs.LG] UPDATED)
    We propose a new model, independent linear Markov game, for multi-agent reinforcement learning with a large state space and a large number of agents. This is a class of Markov games with independent linear function approximation, where each agent has its own function approximation for the state-action value functions that are marginalized by other players' policies. We design new algorithms for learning the Markov coarse correlated equilibria (CCE) and Markov correlated equilibria (CE) with sample complexity bounds that only scale polynomially with each agent's own function class complexity, thus breaking the curse of multiagents. In contrast, existing works for Markov games with function approximation have sample complexity bounds scale with the size of the \emph{joint action space} when specialized to the canonical tabular Markov game setting, which is exponentially large in the number of agents. Our algorithms rely on two key technical innovations: (1) utilizing policy replay to tackle non-stationarity incurred by multiple agents and the use of function approximation; (2) separating learning Markov equilibria and exploration in the Markov games, which allows us to use the full-information no-regret learning oracle instead of the stronger bandit-feedback no-regret learning oracle used in the tabular setting. Furthermore, we propose an iterative-best-response type algorithm that can learn pure Markov Nash equilibria in independent linear Markov potential games. In the tabular case, by adapting the policy replay mechanism for independent linear Markov games, we propose an algorithm with $\widetilde{O}(\epsilon^{-2})$ sample complexity to learn Markov CCE, which improves the state-of-the-art result $\widetilde{O}(\epsilon^{-3})$ in Daskalakis et al. 2022, where $\epsilon$ is the desired accuracy, and also significantly improves other problem parameters.  ( 3 min )
    AudioPaLM: A Large Language Model That Can Speak and Listen. (arXiv:2306.12925v1 [cs.CL])
    We introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models, PaLM-2 [Anil et al., 2023] and AudioLM [Borsos et al., 2022], into a unified multimodal architecture that can process and generate text and speech with applications including speech recognition and speech-to-speech translation. AudioPaLM inherits the capability to preserve paralinguistic information such as speaker identity and intonation from AudioLM and the linguistic knowledge present only in text large language models such as PaLM-2. We demonstrate that initializing AudioPaLM with the weights of a text-only large language model improves speech processing, successfully leveraging the larger quantity of text training data used in pretraining to assist with the speech tasks. The resulting model significantly outperforms existing systems for speech translation tasks and has the ability to perform zero-shot speech-to-text translation for many languages for which input/target language combinations were not seen in training. AudioPaLM also demonstrates features of audio language models, such as transferring a voice across languages based on a short spoken prompt. We release examples of our method at https://google-research.github.io/seanet/audiopalm/examples  ( 3 min )
    Adversarial Training with Generated Data in High-Dimensional Regression: An Asymptotic Study. (arXiv:2306.12582v1 [stat.ML])
    In recent years, studies such as \cite{carmon2019unlabeled,gowal2021improving,xing2022artificial} have demonstrated that incorporating additional real or generated data with pseudo-labels can enhance adversarial training through a two-stage training approach. In this paper, we perform a theoretical analysis of the asymptotic behavior of this method in high-dimensional linear regression. While a double-descent phenomenon can be observed in ridgeless training, with an appropriate $\mathcal{L}_2$ regularization, the two-stage adversarial training achieves a better performance. Finally, we derive a shortcut cross-validation formula specifically tailored for the two-stage training method.  ( 2 min )
    Memory-Query Tradeoffs for Randomized Convex Optimization. (arXiv:2306.12534v1 [cs.DS])
    We show that any randomized first-order algorithm which minimizes a $d$-dimensional, $1$-Lipschitz convex function over the unit ball must either use $\Omega(d^{2-\delta})$ bits of memory or make $\Omega(d^{1+\delta/6-o(1)})$ queries, for any constant $\delta\in (0,1)$ and when the precision $\epsilon$ is quasipolynomially small in $d$. Our result implies that cutting plane methods, which use $\tilde{O}(d^2)$ bits of memory and $\tilde{O}(d)$ queries, are Pareto-optimal among randomized first-order algorithms, and quadratic memory is required to achieve optimal query complexity for convex optimization.  ( 2 min )
    Convergence of Dirichlet Forms for MCMC Optimal Scaling with Dependent Target Distributions on Large Graphs. (arXiv:2210.17042v2 [math.ST] UPDATED)
    Markov chain Monte Carlo (MCMC) algorithms have played a significant role in statistics, physics, machine learning and others, and they are the only known general and efficient approach for some high-dimensional problems. The random walk Metropolis (RWM) algorithm as the most classical MCMC algorithm, has had a great influence on the development and practice of science and engineering. The behavior of the RWM algorithm in high-dimensional problems is typically investigated through a weak convergence result of diffusion processes. In this paper, we utilize the Mosco convergence of Dirichlet forms in analyzing the RWM algorithm on large graphs, whose target distribution is the Gibbs measure that includes any probability measure satisfying a Markov property. The abstract and powerful theory of Dirichlet forms allows us to work directly and naturally on the infinite-dimensional space, and our notion of Mosco convergence allows Dirichlet forms associated with the RWM chains to lie on changing Hilbert spaces. Through the optimal scaling problem, we demonstrate the impressive strengths of the Dirichlet form approach over the standard diffusion approach.  ( 2 min )
    Evolving Computation Graphs. (arXiv:2306.12943v1 [cs.LG])
    Graph neural networks (GNNs) have demonstrated success in modeling relational data, especially for data that exhibits homophily: when a connection between nodes tends to imply that they belong to the same class. However, while this assumption is true in many relevant situations, there are important real-world scenarios that violate this assumption, and this has spurred research into improving GNNs for these cases. In this work, we propose Evolving Computation Graphs (ECGs), a novel method for enhancing GNNs on heterophilic datasets. Our approach builds on prior theoretical insights linking node degree, high homophily, and inter vs intra-class embedding similarity by rewiring the GNNs' computation graph towards adding edges that connect nodes that are likely to be in the same class. We utilise weaker classifiers to identify these edges, ultimately improving GNN performance on non-homophilic data as a result. We evaluate ECGs on a diverse set of recently-proposed heterophilous datasets and demonstrate improvements over the relevant baselines. ECG presents a simple, intuitive and elegant approach for improving GNN performance on heterophilic datasets without requiring prior domain knowledge.  ( 2 min )
    scikit-fda: A Python Package for Functional Data Analysis. (arXiv:2211.02566v2 [stat.CO] UPDATED)
    The library scikit-fda is a Python package for Functional Data Analysis (FDA). It provides a comprehensive set of tools for representation, preprocessing, and exploratory analysis of functional data. The library is built upon and integrated in Python's scientific ecosystem. In particular, it conforms to the scikit-learn application programming interface so as to take advantage of the functionality for machine learning provided by this package: pipelines, model selection, and hyperparameter tuning, among others. The scikit-fda package has been released as free and open-source software under a 3-Clause BSD license and is open to contributions from the FDA community. The library's extensive documentation includes step-by-step tutorials and detailed examples of use.  ( 2 min )
    Density Uncertainty Layers for Reliable Uncertainty Estimation. (arXiv:2306.12497v1 [cs.LG])
    Assessing the predictive uncertainty of deep neural networks is crucial for safety-related applications of deep learning. Although Bayesian deep learning offers a principled framework for estimating model uncertainty, the approaches that are commonly used to approximate the posterior often fail to deliver reliable estimates of predictive uncertainty. In this paper we propose a novel criterion for predictive uncertainty, that a model's predictive variance should be grounded in the empirical density of the input. It should produce higher uncertainty for inputs that are improbable in the training data and lower uncertainty for those inputs that are more probable. To operationalize this criterion, we develop the density uncertainty layer, an architectural element for a stochastic neural network that guarantees that the density uncertain criterion is satisfied. We study neural networks with density uncertainty layers on the CIFAR-10 and CIFAR-100 uncertainty benchmarks. Compared to existing approaches, we find that density uncertainty layers provide reliable uncertainty estimates and robust out-of-distribution detection performance.  ( 2 min )
    Communication-Efficient Federated Learning through Importance Sampling. (arXiv:2306.12625v1 [cs.LG])
    The high communication cost of sending model updates from the clients to the server is a significant bottleneck for scalable federated learning (FL). Among existing approaches, state-of-the-art bitrate-accuracy tradeoffs have been achieved using stochastic compression methods -- in which the client $n$ sends a sample from a client-only probability distribution $q_{\phi^{(n)}}$, and the server estimates the mean of the clients' distributions using these samples. However, such methods do not take full advantage of the FL setup where the server, throughout the training process, has side information in the form of a pre-data distribution $p_{\theta}$ that is close to the client's distribution $q_{\phi^{(n)}}$ in Kullback-Leibler (KL) divergence. In this work, we exploit this closeness between the clients' distributions $q_{\phi^{(n)}}$'s and the side information $p_{\theta}$ at the server, and propose a framework that requires approximately $D_{KL}(q_{\phi^{(n)}}|| p_{\theta})$ bits of communication. We show that our method can be integrated into many existing stochastic compression frameworks such as FedPM, Federated SGLD, and QSGD to attain the same (and often higher) test accuracy with up to $50$ times reduction in the bitrate.  ( 2 min )
    On the explainable properties of 1-Lipschitz Neural Networks: An Optimal Transport Perspective. (arXiv:2206.06854v2 [cs.AI] UPDATED)
    Input gradients have a pivotal role in a variety of applications, including adversarial attack algorithms for evaluating model robustness, explainable AI techniques for generating Saliency Maps, and counterfactual explanations. However, Saliency Maps generated by traditional neural networks are often noisy and provide limited insights. In this paper, we demonstrate that, on the contrary, the Saliency Maps of 1-Lipschitz neural networks, learnt with the dual loss of an optimal transportation problem, exhibit desirable XAI properties: They are highly concentrated on the essential parts of the image with low noise, significantly outperforming state-of-the-art explanation approaches across various models and metrics. We also prove that these maps align unprecedentedly well with human explanations on ImageNet. To explain the particularly beneficial properties of the Saliency Map for such models, we prove this gradient encodes both the direction of the transportation plan and the direction towards the nearest adversarial attack. Following the gradient down to the decision boundary is no longer considered an adversarial attack, but rather a counterfactual explanation that explicitly transports the input from one class to another. Thus, Learning with such a loss jointly optimizes the classification objective and the alignment of the gradient , i.e. the Saliency Map, to the transportation plan direction. These networks were previously known to be certifiably robust by design, and we demonstrate that they scale well for large problems and models, and are tailored for explainability using a fast and straightforward method.  ( 3 min )

  • Open

    RL In research vs industry
    Hi all! I'm finishing my masters in a few months and am contemplating pursuing a PhD in ML/RL. To the most experienced ones here: - do you use RL in non research environments? - Is RL research still going strong? It seemed to be the biggest thing a few years ago, and now sequence modeling transformers etc seem to have kind of taken over... I'm at the research vs industry point in my life and i'm very worried that going in the industry will just lead me to using basic and trusted models instead of being able to try things a little more 'unorthodox'. Any advice would be greatly appreciated! submitted by /u/Secret-Toe-8185 [link] [comments]  ( 8 min )
    "The False Promise of Imitating Proprietary LLMs" Gudibande et al 2023 {UC Berkeley} (imitation models close little to none of the gap on tasks that are not heavily supported in the imitation data)
    submitted by /u/gwern [link] [comments]  ( 8 min )
    "LIMA: Less Is More for Alignment", Zhou et al 2023 (RLHF etc only exploit pre-existing model capabilities)
    submitted by /u/gwern [link] [comments]  ( 8 min )
    "SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking", Cundy & Ermon 2023
    submitted by /u/gwern [link] [comments]  ( 8 min )
  • Open

    Before we have to get licensed, I am building the best AI assistant I can imagine
    It seems pretty likely that AI is going to end our current way of living and replace it with one where the robots are in charge The powers that be are scrambling to figure out how they’re going to hold onto their status quo and the more I read the more unlikely it seems that anyone will be able to hold on to anything If we are all sprinting into an AI controlled future I want my assistant to work for me, rather than me work for the assistant I want my data to be as private as possible, with as few corporations grubby little hands on my information as possible I’ve been working on a discord bot, and I finally got it working. It’s context aware and uses open AI to process the information. As soon as I figure out how, I’d rather use local models hosted on the users devices It seems like at a certain point, all of our bots are going to cross a threshold where they are fully capable of coding and improving themselves I really don’t see how capitalism can survive this because the gap between the haves and the have nots will make regular folks purchasing power functionally nonexistent It seems pretty likely that we’re going to go through a massive social renaissance and possibly even a violent revolution I’d like to make sure that I survive this and that me and my loved ones flourish while we can still use the tools at our disposal for our own benefit If you have any thoughts, you’d like to share, I’d love to hear them. I guess I’m looking for input from other folks who are down this rabbit hole as far as I am. submitted by /u/thecoffeejesus [link] [comments]  ( 9 min )
    The people paid to train AI are outsourcing their work… to AI
    submitted by /u/bartturner [link] [comments]  ( 8 min )
    Is there a way ai could help me with my job
    So I work for a Uber eats like company. I use the Tookan agent app. So the problem is when there are less order you are competing with other agents to get them. I wanna know it there was a ai that could except the orders for me and faster then me well allso when it’s bazzey makeing it except orders with specific words and codes. I use a iPhone but could switch to android if that was the only way. Is this a real possibility? submitted by /u/blakestir14 [link] [comments]  ( 8 min )
    Joscha Bach and Connor Leahy - Machine Learning Street Talk - Why AGI is inevitable and the alignment problem is not unique to AI - 90 Minutes Talk
    https://youtu.be/Z02Obj8j6FQ Joscha Bach believes that AGI is inevitable because it will emerge from civilization, not individuals. Given our biological constraints, humans cannot achieve a high level of general intelligence on our own. Civilization evolves over generations to determine meaning, truth, and ethics. AI may follow a similar path, developing general intelligence through a process of cultural evoution. Bach thinks AGI may become integrated into all parts of the world, including human minds and bodies. He believes a future where humans and AGI harmoniously coexist is possible if we develop a shared purpose and incentive to align. Bach also believes that the alignment problem is less of a problem than we think. He argues that the alignment problem is not unique to AI and that it is a fundamental problem of civilization. Text above created with the help of BingChat (gpt-4). I watched the whole discussion and agree with the created summary. Leahy on the other hand was in my opinion far to alarmist and unhinged, he kept saying that he was pragmatic or practical, and then he said philosophy and category theory are groundless to his work but he enjoyed using it in arguments anyways. All while cursing and saying "I don't want to die I don't my mom to die I don't want Joscha to die". I also got the impression that he didn´t understand Joschas arguments. At the same time I didn´t understand his fear ridden arguments. Transcript: Discussion between Joscha Bach and Connor Leahy - Google Docs submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    Day 10: I did some research and experimented with different prompts on Bing Image Creator
    ​ https://preview.redd.it/06ugwq1clm7b1.jpg?width=1024&format=pjpg&auto=webp&s=5180010ea1b8aa776d211b5d48be02bc689c784c https://preview.redd.it/hzfcfq1clm7b1.jpg?width=1024&format=pjpg&auto=webp&s=1f5378ad7b77b55e00e64796a076807b818bb93d https://preview.redd.it/gqoy6u1clm7b1.jpg?width=1024&format=pjpg&auto=webp&s=efb1d723697f435d5d68a199202e4ac5a1186f8f https://preview.redd.it/x633pt1clm7b1.jpg?width=1024&format=pjpg&auto=webp&s=b7239e2787f312d062c76d9f085adf64f57246f4 https://preview.redd.it/oq39ku1clm7b1.jpg?width=1024&format=pjpg&auto=webp&s=34a44495873a55e2eeeb10c08fc1dc493d321395 https://preview.redd.it/2dlkax1clm7b1.jpg?width=1024&format=pjpg&auto=webp&s=6b6b5439a1caa4a14be367f38b36f94c1190cf00 https://preview.redd.it/wcii3k2clm7b1.jpg?width=1024&format=pjpg&auto=webp&s=b9152c30a2952143680a240cd504f715f491c33c https://preview.redd.it/qkw2pm2clm7b1.jpg?width=1024&format=pjpg&auto=webp&s=8c21f289f129e5a5108180d64ead93151fe9ec5a submitted by /u/Blaze_furyX [link] [comments]  ( 8 min )
    Day 10: I did some research and experimented with different prompts on Bing Image Creator
    ​ https://preview.redd.it/3s8sybi5lm7b1.jpg?width=1024&format=pjpg&auto=webp&s=100e2a99150ed03d5338b9346857e8b5c622913d https://preview.redd.it/xsafnai5lm7b1.jpg?width=1024&format=pjpg&auto=webp&s=ae79e6ea1ba8b8b46abf132401423f481eff2fe7 https://preview.redd.it/jao94di5lm7b1.jpg?width=1024&format=pjpg&auto=webp&s=8c07d8bf1cbb9b125dd6564892de38b3313ca340 https://preview.redd.it/oe59b0j5lm7b1.jpg?width=1024&format=pjpg&auto=webp&s=ff1c2b0d1e95922c335a6bc0d3da7f67b2d8cf28 https://preview.redd.it/nkosvdi5lm7b1.jpg?width=1024&format=pjpg&auto=webp&s=5803b85a29057dd4d5ad124b1d7513a44be04a5f https://preview.redd.it/vnwo80j5lm7b1.jpg?width=1024&format=pjpg&auto=webp&s=e4ab5f68c418bc0b4de6e881b6a793717f63e22a https://preview.redd.it/np6i7gi5lm7b1.jpg?width=1024&format=pjpg&auto=webp&s=cdb91a7593ef7ecef092bab1afc6ca58e115e78e https://preview.redd.it/fczpm3j5lm7b1.jpg?width=1024&format=pjpg&auto=webp&s=410748c7d095190019f4d932237eedbfebd4a7b4 https://preview.redd.it/a66mv4j5lm7b1.jpg?width=1024&format=pjpg&auto=webp&s=1b21e4587f932bfd6e622130040fa8652767d0f5 submitted by /u/Blaze_furyX [link] [comments]  ( 8 min )
    AI for making a film look more cinematic
    Is there an AI software that can make my film I shot in 1K look more cinematic? If so, which one? Any experience shares? Thank you submitted by /u/themarshman721 [link] [comments]  ( 8 min )
    Automata - A Bottom-Up version of AutoGPT
    Hey All, I'm a big fan of r/singularity - it has been a big inspiration for me. This morning I just open sourced a big project I've been working on - https://github.com/emrgnt-cmplxty/Automata ​ I think of this as the inverse approach to the one being taken by AutoGPT. The goal of this system is to automate low-level coding tasks and to incrementally increase the capabilities until the system is coding itself. ​ Here's the tagline for it: Automata's objective is to evolve into a fully autonomous, self-programming Artificial Intelligence system. This project is inspired by the theory that code is essentially a form of memory, and when furnished with the right tools, AI can evolve real-time capabilities which can potentially lead to the creation of AGI. The word automata comes from the Greek word αὐτόματος, denoting "self-acting, self-willed, self-moving,", and Automata theory is the study of abstract machines and automata, as well as the computational problems that can be solved using them. More information follows below. If you have the time, please take a look. If you are interested in contributing or know someone who might be please feel free to drop me a line! https://reddit.com/link/14gar5y/video/x0ywab054m7b1/player submitted by /u/docsoc1 [link] [comments]  ( 8 min )
    In order to Regulate Artificial Intelligence, Shouldn't we Agree on Intelligence?
    There is not a standard definition of what an intelligence is. Premature regulation on "aritificial" intelligence highly pins on what can legally be defined as intelligence enough not to be artificial, and what is the deeper question. What is intelligence? I'm afraid Biden or any one person, be they politicians or scientists, cannot properly define this term in a way that is sufficient to draw a clear legal line between intelligence and artificial intelligence. We could end up with laws that regulate and hamper what humans create once we gain new insight into what we want to all agree on as the standard definition of intelligence. Thoughts? submitted by /u/enspiralart [link] [comments]  ( 8 min )
    Suggestions on local or web GPT-4 API clients with access to embeddings?
    Hello folks: A majority of the local clients I've tried from compiled binaries or via GitHub repos are buggy. Thus, I'm on the hunt for a local or web client that I can access ChatGPT via my API keys but also link to Pinecone or other embeddings. Obviously I also want to ensure that the key and data in the embeddings are secure, regardless of a local or web client. anything-llm was my go-to, but it's got some hiccups. FWIW, running a local model, even with a 4070, just doesn't give me the results I need consistently. Thus, I'll continue to pay a few bucks for OpenAI Chat GPT API access until the scene improves. Thanks! submitted by /u/avguru1 [link] [comments]  ( 8 min )
    Chatbot to train with business database and then query it to show charts. What are your suggestions?
    Hi, So, I have this case study I wanted to create. My idea is grabbing on my full database and then feed it to something that could be private (so preferably not OpenAI related). Then, I want to place a written message, requesting whatever I wanted and it should use that knowledge from that db to answer based on all that data. In order to make it easier, I could even have a button for each chart type that I would click and it could create that chart based on my written request. Those answers, if I request a specific chart type, it should set the output in that chart. It can even be limited to 2 or 3 charts, that's fine for me for now as it would be enough for my needs. What do you suggest to achieve this? Thanks! submitted by /u/pedlima [link] [comments]  ( 8 min )
    Recommendations for best AI for extracting relevant information from large document sets
    Hello all, I don't 'do' AI, so this might be kind of a basic question and I apologize in advance. I am looking for a good software/model that I can train on a very large data set of technical text documents and ask questions to and get back highly-relevant responses from the documents along with where it pulled the information from. Basically and advanced full-text search function that can formulate answers from a wide body of text. The relevant searching part is more important than the quality of the formulated answer, it is more about finding all the relevant material and identifying the source document/section. Ideally, I am looking for something that (1) can be run offline/completely locally and on your own server, (2) is free or at least has a decent trial experience, (3) is well established and has a good support base, (4) works well. I feel like this probably exists and has existed for some time, but I am wondering what is state of the art given the recent developments in AI. Does anyone have any recommendations of where I might look? Thanks! submitted by /u/captainporthos [link] [comments]  ( 8 min )
    ChatGPT4all to create chatbot to answer questions on your own docs without external calls.
    So, I came across this tut, https://artificialcorner.com/gpt4all-is-the-local-chatgpt-for-your-documents-and-it-is-free-df1016bc335 (Apologies, if you cannot access it, it is a member's only story) and I gave it a shot. Technically, it "works". However, it seems to be a bit poor in the sense that I only fed it 5-600 PDF files and even if I ask a question copying the title of the file, it gives some other answers. I played around with the "template" variable and this seems to be the best to me. Basically, I just want it to answer questions from the "context" which is basically an index of my docs. Any suggestions on how to improve this? import os from langchain import PromptTemplate, LLMChain from langchain.llms import GPT4All from langchain.callbacks.base import CallbackManager from langc…  ( 9 min )
    Wimbledon to introduce AI-powered commentary to coverage this year | Tennis
    submitted by /u/ChubbyBrunch [link] [comments]  ( 8 min )
    Secret Invasion: Marvel faces backlash from artists and fans over AI-generated opening sequence
    submitted by /u/PleasantLiberation [link] [comments]  ( 8 min )
    AI startup Neople raises €1.5M for digital co-workers revolution
    submitted by /u/startst5 [link] [comments]  ( 8 min )
    AI For Analyzing Large Volumes of Text in Spreadsheet Format?
    Hello. I'm currently looking through scholarships and have compiled a ~5000-columnrow spreadsheet of a bunch of em. The goal is to filter these through to find the ones that are most relevant to me, then sort through those to find a list of scholarships to run by. I've been filtering just by deleting columns with disqualifying keywords, but I'm 30 minutes in and I've eliminated less than 500. Does anyone know of an AI tool that can go through the spreadsheet and filter as I need? I ran a few items through GhatGPT and it did well, but the character limit makes OpenAI infeasible. Thank you! submitted by /u/Yocracra [link] [comments]  ( 8 min )
    Snapchat AI creating poetry!
    Of course, here's a poem about AI: Artificial intelligence, a wonder to behold, A creation of technology, a story to be told. It learns and adapts, like a living thing, A digital mind, with endless potential to bring. With algorithms and data, it grows and evolves, A new era of thinking, a revolution that solves. It helps us with tasks, and makes life easier, A partner in progress, a true digital seer. From chatbots to self-driving cars, AI technology has come so far. It's changing the world, and we're just beginning, A future of possibilities, with AI winning. So let us embrace this new frontier, With open minds and hearts that are clear. For AI will change the way we live, And give us a future we can truly believe. submitted by /u/Biddahmunk [link] [comments]  ( 8 min )
    Johnny Cash singing voice generated by AI
    submitted by /u/Only-Control5926 [link] [comments]  ( 8 min )
  • Open

    Mentally approximating factorials
    Suppose you’d like to have a very rough idea how large n! is for, say, n less than 100. If you’re familiar with such things, your Pavlovian response to “factorial” and “approximation” is Stirling’s approximation. Although Stirling’s approximation is extremely useful—I probably use it every few weeks—it is not conducive to mental calculation. The cut […] Mentally approximating factorials first appeared on John D. Cook.  ( 5 min )
    Leading digits of primes
    How are the first digits of primes distributed? Do some digits appear as first digits of primes more often that others? How should we even frame the problem? There are an infinite number of primes that begin with each digit, so the cardinalities of the sets of primes beginning with each digit are the same. […] Leading digits of primes first appeared on John D. Cook.  ( 6 min )
  • Open

    MIT-Pillar AI Collective announces first seed grant recipients
    Six teams conducting research in AI, data science, and machine learning receive funding for projects that have potential commercial applications.  ( 9 min )
  • Open

    SoundStorm: Efficient parallel audio generation
    Posted by Zalán Borsos, Research Software Engineer, and Marco Tagliasacchi, Senior Staff Research Scientist, Google Research The recent progress in generative AI unlocked the possibility of creating new content in several different domains, including text, vision and audio. These models often rely on the fact that raw data is first converted to a compressed format as a sequence of tokens. In the case of audio, neural audio codecs (e.g., SoundStream or EnCodec) can efficiently compress waveforms to a compact representation, which can be inverted to reconstruct an approximation of the original audio signal. Such a representation consists of a sequence of discrete audio tokens, capturing the local properties of sounds (e.g., phonemes) and their temporal structure (e.g., prosody). By re…  ( 92 min )
  • Open

    How would you write a reinforcement ML model for completing missions in complex video games (such as GTAV)
    I had a idea recently, that it would be great to write a model, which completes series of complex tasks (take out "xxx" for instance) It would analyse the display, analyse the game state and game objective (go to "xxx", as an example) and make a decisions how to achieve these goals. What do you think about it and how could I implement this idea? submitted by /u/GabosLord [link] [comments]  ( 8 min )
    The Neural Net Tank Urban Legend
    submitted by /u/nickb [link] [comments]  ( 8 min )
  • Open

    DeepSpeed ZeRO++: A leap in speed for LLM and chat model training with 4X less communication
    Large AI models are transforming the digital world. Generative language models like Turing-NLG, ChatGPT, and GPT-4, powered by large language models (LLMs), are incredibly versatile, capable of performing tasks like summarization, coding, and translation. Similarly, large multimodal generative models like DALL·E, Microsoft Designer, and Bing Image Creator can generate art, architecture, videos, and other digital […] The post DeepSpeed ZeRO++: A leap in speed for LLM and chat model training with 4X less communication appeared first on Microsoft Research.  ( 16 min )
    Collaborators: Renewable energy storage with Bichlien Nguyen and David Kwabi
    Researcher Bichlien Nguyen is an organic electrochemist turned technologist. Professor David Kwabi is a mechanical engineer. Their work uses ML to help discover organic compounds for renewable energy storage. Learn about their collaboration. The post Collaborators: Renewable energy storage with Bichlien Nguyen and David Kwabi appeared first on Microsoft Research.  ( 33 min )
  • Open

    How Light & Wonder built a predictive maintenance solution for gaming machines on AWS
    This post is co-written with Aruna Abeyakoon and Denisse Colin from Light and Wonder (L&W). Headquartered in Las Vegas, Light & Wonder, Inc. is the leading cross-platform global game company that provides gambling products and services. Working with AWS, Light & Wonder recently developed an industry-first secure solution, Light & Wonder Connect (LnW Connect), to […]  ( 12 min )
  • Open

    Scientists Improve Delirium Detection Using AI and Rapid-Response EEGs
    Detecting delirium isn’t easy, but it can have a big payoff: speeding essential care to patients, leading to quicker and surer recovery. Improved detection also reduces the need for long-term skilled care, enhancing the quality of life for patients while decreasing a major financial burden. In the U.S., caring for those suffering from delirium costs Read article >  ( 5 min )
    A Golden Age: ‘Age of Empires III’ Joins GeForce NOW
    Conquer the lands in Microsoft’s award-winning Age of Empires III: Definitive Edition. It leads 10 new games supported today on GeForce NOW. At Your Command Age of Empires III: Definitive Edition is a remaster of one of the most beloved real-time strategy franchises featuring improved visuals, enhanced gameplay, cross-platform multiplayer and more. Command mighty civilizations Read article >  ( 4 min )
  • Open

    Fast Path to Becoming a Generative AI Technical Expert
    Training programs on generative AI are now popping up everywhere. One of my connections called it “a joke” after enrolling in such a program. Many are expensive, time consuming and offer low value. Participants proudly display the certification and even praise the instructors. They don’t know that in the end, it won’t lead to any… Read More »Fast Path to Becoming a Generative AI Technical Expert The post Fast Path to Becoming a Generative AI Technical Expert appeared first on Data Science Central.  ( 22 min )
    DSC Webinar Series – The Hybrid Workflow – Combining the Best of Desktop and Cloud
    Want to leverage Cloud data technology without sacrificing your existing expertise on Desktop tools? This webinar is for you. Cloud tools, like Designer Cloud Powered by Trifacta, allow data professionals to take advantage of new powerful capabilities such as machine-learning powered data transformation suggestions, unlimited scalability, browser-based manageability, and strong governance. But Desktop tools, like… Read More »DSC Webinar Series – The Hybrid Workflow – Combining the Best of Desktop and Cloud The post DSC Webinar Series – The Hybrid Workflow – Combining the Best of Desktop and Cloud appeared first on Data Science Central.  ( 18 min )
    DSC Webinar Series – Modernize your SAP environments quickly and effectively
    An SAP S/4HANA deployment can be a demanding, complex, and time-consuming project. SAP modernization, SAP database adoption, and SAP cloud migration together pose a challenge with too many variables for most businesses. Even with the wealth of support options available to businesses making the move to SAP S/4HANA not all succeed. Seeing these unsettling trends… Read More »DSC Webinar Series – Modernize your SAP environments quickly and effectively The post DSC Webinar Series – Modernize your SAP environments quickly and effectively appeared first on Data Science Central.  ( 19 min )
    DSC Webinar Series – It’s not about your algorithms…
    Increasing the effectiveness of your AI efforts is not about more sophisticated algorithms – it’s the data that makes the difference. While teams have been focusing on using advanced learning techniques and algorithms to build better models, the real value can be found in focusing on data. Taking a data-centric AI perspective has a more… Read More »DSC Webinar Series – It’s not about your algorithms… The post DSC Webinar Series – It’s not about your algorithms… appeared first on Data Science Central.  ( 18 min )
    DSC Webinar Series – AI for Marketers Optimize – personalization across the funnel
    Maybe you’ve heard: AI is the next phase of digital transformation and a train that no marketer can afford to miss. But how do you adopt AI into your team’s tools and workflows efficiently, without hiring a whole new team of technical experts? Join Sean Ryan, Technical Writer at mParticle, and Michael Firn, Product Manager… Read More »DSC Webinar Series – AI for Marketers Optimize – personalization across the funnel The post DSC Webinar Series – AI for Marketers Optimize – personalization across the funnel appeared first on Data Science Central.  ( 18 min )
    DSC Webinar Series – United AI Alliance
    Countries facing the worst impacts of the climate crisis are missing out on the best data technologies to inform their responses. As a result, communities are suffering needlessly. Data and technology offer significant potential for rapid action to improve health and education outcomes and planetary health- to protect lives and livelihoods. Yet, across the globe,… Read More »DSC Webinar Series – United AI Alliance The post DSC Webinar Series – United AI Alliance appeared first on Data Science Central.  ( 19 min )
    DSC Webinar Series – Data Democratization – How SearchKings Built a Self-Service Data Culture
    In this webinar, we’ll chat with SearchKings, a digital advertising agency that built a self-service data culture on the Google Cloud ecosystem by leveraging powerful Google native tools, such as Google Cloud Dataprep (the Google native version of Alteryx Designer Cloud). Learn how SearchKings made it easier for users to build self-service reports, improved the… Read More »DSC Webinar Series – Data Democratization – How SearchKings Built a Self-Service Data Culture The post DSC Webinar Series – Data Democratization – How SearchKings Built a Self-Service Data Culture appeared first on Data Science Central.  ( 18 min )

  • Open

    Would I be able to run ggml models such as whisper.cpp or llama.cpp on a raspberry pi with a coral ai USB Accelerator?
    This is a project that I'm working on: making a type of "Alexa", "Hey Google", or "Siri" for my workplace. I'm very new to AI and am looking forward to learning a lot. I thought initially to use different models that interact together to create such a voice assistant. For example, I would use Whisper.cpp to transcribe audio, then send the text to Llama.cpp, and then use a text-to-speech software to reply. I want to do this all on a raspberry pi 3 B2 (it's what I have available). However, a pi doesn't have the strength to run something like Llama.cpp, of course, so I've been considering using something like the Coral USB Accelerator (https://coral.ai/products/accelerator). As I've been learning more about it, it seems to be very geared towards TensorFlow Lite models. But whisper.cpp and Llama.cpp use ggml models. Here are my questions: Could the coral ai USB Accelerator run ggml models and, if so, how? Is there a better system to creating a local (no 3rd party api) at-home assistant? Please let me know if I could do something better and what that thing is. I'd appreciate all sorts of advice. Thank you! Links Coral USB Accelerator https://coral.ai/products/accelerator Whisper.cpp https://github.com/ggerganov/whisper.cpp Llama.cpp https://github.com/ggerganov/llama.cpp submitted by /u/Effective_Muffin_700 [link] [comments]  ( 8 min )
    Some portraits done with Imagine's "V4 Beta"
    submitted by /u/MDDeGrande1994 [link] [comments]  ( 8 min )
    Marvel faces backlash over AI-generated opening credits - Social media users condemned use of AI -generated opening credits for Secret Invasion premiering this week on Disney+
    submitted by /u/magenta_placenta [link] [comments]  ( 8 min )
    Help regarding job interview
    Hello everyone. I will cut to the chase, Does anyone here uses Salesforce as their CRM and if they do can you please advice me how to analyze , manage and clean data in less amount of time as possible or using Any AI apps. The data would Include detailed information on biotech firms their employees, clients etc. FYI I am a non tech person hence this post. Any help would be appreciated. Thanks submitted by /u/wingzero_7 [link] [comments]  ( 8 min )
    Instead of waiting for policies, can we agree on a voluntary AI developers ethical code?
    submitted by /u/ToLoveThemAll [link] [comments]  ( 8 min )
    Something something AI meme
    submitted by /u/Keegan1948 [link] [comments]  ( 8 min )
    Janitor AI provides Personalized Chatbot Experiences with the potential for AI companionship
    submitted by /u/Hello_I_am_newell [link] [comments]  ( 8 min )
    The AMA has been postponed for a few weeks
    Hello, the AMA scheduled for today is postponed. The admins gave me a specific date, and I thought it was set as I didn't hear anything else for over a week. It will still happen it seems as Microsoft still wants to do it, I just don't have another date set yet. I may make another announcement when I get another date as I feel this will be important for everyone. submitted by /u/jaketocake [link] [comments]  ( 8 min )
    Over 100,000 ChatGPT account credentials have been stolen, yours may be on the list!
    Group-IB, a cybersecurity research company, just discovered through their newly implemented “Threat Intelligence” platform logs of info-stealing malware* traded on illicit dark web markets. So far it is estimated that around 100 000 accounts have been infected by software like Raccoon*, Vidar*, and Redline*, malware that held ChatGPT credentials. A peak of 26,802 compromised ChatGPT accounts was recorded in May 2023 (compare that to only 74 compromised during the month of June 2022). Apart from privacy concerns, these leaks may lead to exposing confidential information due to ChatGPT being used by many employees across different industries. Also doesn’t help that OpenAI stores all of the user queries and AI responses. The company is currently under a lot of pressure considering these even…  ( 9 min )
    We're going to be flooded with AI videos pretending to be real. How about a real video, pretending to be AI generated?
    submitted by /u/kelerian [link] [comments]  ( 8 min )
    Looking for a general AI course
    Looking for recommendations of courses I could do on AI. Basically I'm a teacher and teach some basic IT, but also want to upskill myself. And maybe help me be able to develop a basic course for students. And I need an actual course, not just a book or browsing, as I could probably get school to pay and give me time if it's a course. Looking around I can see plenty of basic courses on things like ChatGPT, but I'm after something a bit beefier, that covers a variety of technologies. But anything more advanced seem to be machine learning or other creating AI courses, which I'm not interested in. Does anyone know of any courses that teach about using a variety of technologies, or one in a more advanced way? Thanks in advance. submitted by /u/Heya_Andy [link] [comments]  ( 8 min )
    What are good AI podcasts to listen too?
    Any good podcast suggestions on spotify for the field of AI in general and how different things are progressing? submitted by /u/derpgod123 [link] [comments]  ( 8 min )
    The best AI photo editors in 2023
    submitted by /u/Alarming-Ad8197 [link] [comments]  ( 8 min )
    I generated some product photos using AI, any thoughts? Do you think they look good enough for use? Using a template product for testing
    submitted by /u/deadinsideyou [link] [comments]  ( 8 min )
    Prompted an image of new BMW i7 (https://media.ed.edmunds-media.com/bmw/i7/2023/oem/2023_bmw_i7_sedan_xdrive60_fq_oem_1_1280.jpg) with "Imagine" by using "Imagine V4 Beta" program and got THIS as a result (instead of garbage they put out in real life). Wasn't disappointed at all. BMW, take notes.
    submitted by /u/MDDeGrande1994 [link] [comments]  ( 8 min )
    StyleGAN by Nvidia: Revolutionizing Generative AI
    Introduction: StyleGAN, developed by Nvidia, has emerged as a groundbreaking technology in the field of generative artificial intelligence (AI). With its ability to generate hyper-realistic images and videos, StyleGAN has captured the attention of researchers, artists, and enthusiasts worldwide. Let's explore the key features and implications of this remarkable AI system. Understanding StyleGAN: StyleGAN stands for "Style-based Generative Adversarial Network," a deep learning architecture that excels in generating high-resolution images that resemble real photographs. Unlike traditional GAN models, StyleGAN allows for control over various aspects of the generated output, such as facial expressions, hair styles, and even artistic interpretations. Key Features: High-Quality Image Synthe…  ( 9 min )
    Best generator to create photorealistic images of myself
    I have a handful of images of myself (headshots, etc) but am interested in using them to create photorealistic settings (at a bar, pool, etc) or with others as I don’t have many friends What is the best AI generator to do this on? Willing to pay, etc submitted by /u/TreesPast [link] [comments]  ( 8 min )
    AI Is a Lot of Work
    https://www.theverge.com/features/23764584/ai-artificial-intelligence-data-notation-labor-scale-surge-remotasks-openai-chatbots ​ Much of the public response to language models like OpenAI’s ChatGPT has focused on all the jobs they appear poised to automate. But behind even the most impressive AI system are people — huge numbers of people labeling data to train it and clarifying data when it gets confused. Only the companies that can afford to buy this data can compete, and those that get it are highly motivated to keep it secret. The result is that, with few exceptions, little is known about the information shaping these systems’ behavior, and even less is known about the people doing the shaping. The anthropologist David Graeber defines “bullshit jobs” as employment without meaning …  ( 9 min )
    Eliezer Yudkowsky claims he’s been working on Ai alignment since 2001...
    Eliezer Yudkowsky claims he’s been working on Ai alignment since 2001... ...How? What technology needed aligning in 2001? The modern LLM’s and transformers of today only started emerging in 2017. One of the first uses of RLHF was for TAMER in 2009. So, what in the hell was his research targeting back in 2001 other than the purely philisophical, theoretical, or speculative? Shouldn’t yah know a thing or two about the technology and systems capable of potentially hosting an emerging Ai before you try to build in back doors and safety features? submitted by /u/Relevant-Blood-8681 [link] [comments]  ( 8 min )
  • Open

    SAC policy loss and entropy balance
    Hey fellow Redditors, I have a couple of questions regarding Soft Actor-Critic (SAC) and its policy loss computation and entropy optimization. I'd appreciate your insights on these matters: What should be done in SAC when the Q-values assume significantly larger magnitudes than the entropy term during policy loss computation (e.g. entropy_term = -0.6, q_value=-15)? This situation leads to the entropy not being effectively exploited in the end. Are there any recommended approaches to address this issue? Maybe Q-values normalization or reward normalization? In SAC, the loss used to optimize the entropy temperature alpha is defined as entropy_loss = -log_alpha * (log_probabilities + target_entropy). Here, the term (log_probabilities + target_entropy)is always negative, as log_probabilities are less than or equal to 0, and target_entropy is suggested to be set as -dim(Action space).My concern is that if we compute the gradient of entropy_loss with respect to log_alpha, we obtain -(log_probabilities + target_entropy), which is a positive value. Consequently, during the optimization step when we attempt to minimize the loss, it seems that alpha will start decreasing. How can the alpha parameter ever increase in such a scenario? Am I missing something in understanding the optimization process? Looking forward to hearing your thoughts and suggestions on these topics! PS: I am working with a Husky UGV on gazebo, trying to train an agent that leads him towards a target state. The action space is [linear velocity, angular velocity]. submitted by /u/germonio_ [link] [comments]  ( 9 min )
    Neuroevolution and self-play: results of my simulations, promising but not there yet
    Hello, After the end of my semester on RL, I've tried to implement neuroevolution on a 1v1 game. The idea is to have a neural network taking the state as input and outputting an action. E.g. the board is 64x64 and the output might be "do X" or "do X twice" or "do X and Y" or "do Y and Z twice", etc ... The reward being quite sparse (only win/loss), I thought neuroevolution could be quite cool (I've read somewhere (I've lost the source so if you know where it comes from?) that sparse rewards were better suited for neuroevolution and games with loads of information on the rewards could be better for more standard RL methods like REINFORCE, DeepQ, etc ...). I set the algorithms to play against each other, starting with random behaviors. Each generation, I have 25 algorithms, battling each …  ( 12 min )
    How to include forecasts of observation as observations in RL?
    Hello, I want to provide the reinforcement learning agent about the future state forecasts. So for some of the observations in s_t, I want to also provide s_{t + 1} to maybe up to s_{t+6}. What is a good way to achieve this? I could make each of the forecasted states each be a variable in the observation vector, but I was wondering if there are alternative methods. submitted by /u/Quanta12388 [link] [comments]  ( 8 min )
    Multi Agent RL
    Hi all, just wanted some advice from experienced people. I have a case where several robots work in the same environment( let’s say warehouse) and they will have different goal location. They should navigate in the space and avoid collisions with static obstacles and collisions with other robots. Does this case fit into Multi Agent RL or it is still can be done with single agent. I am leaning towards MARL since all robots will be smart agents. Thanks! submitted by /u/No_Artichoke3603 [link] [comments]  ( 8 min )
    Building a reinforcement learning library for .NET
    Just thought I'd share some of my work in progress. Work in progress so I'd appreciate feedback from RL gurus while its still easy to make modifications Currently I am building a reinforcement learning library for .NET running on top of torchsharp. Motivation: Since I am mostly using .NET for PhD and I got tired of interoping with matlab. C# is superior programming language and IL is much more suited for working with environments than python. Static types guarantee that you won't go off the rails and OOP lets you create your own modifications fast. How's it looks so far: So far I have example for cartpole with DQN and PPO In console application that's how you create agent: var optsppo = new PPOAgentOptions( batchSize: 4, // Number of steps agent interacts with environment before le…  ( 10 min )
    [RLlib] Error! TypeError: create_policy_mapping_fn..mapping_fn() got an unexpected keyword argument ‘worker’
    I have been trying to move my code from old ray RLlib version to 2.5.0 (latest). You can find the old code @ GitHub - ankithaldar/all_python_projects at world_of_supply/main I have made significant changes to the old code to move it to new version which can be found in branch world_of_supply/fix_errors in the same repo. ​ The idea is based on this blog ​ --- ​ Below error is in the new version of the code in branch world_of_supply/fix_errors I’m getting an error as TypeError: create_policy_mapping_fn..mapping_fn() got an unexpected keyword argument 'worker' when I try to train my PPO in the file at line 228 result = ppo_trainer.train() How do I resolve this? submitted by /u/haldarankit [link] [comments]  ( 8 min )
    RL snake game using CNN
    I've pretty much speed ran intro to ML and reinforcement learning through a textbook within the past couple weeks so I'm def not an expert but from what ive read it's possible to treat an n x n board as the state itself and use a CNN instead of doing head-relative states and using a 1D array to represent the state. however my snake agent seems to not be learning at all. ive let my code run for over 15 hours and the reward averages do not go up. I'm pasting a GitHub gist link here; if you spot a bug or if you think my logic is wrong please lmk any constructive criticism is appreciated https://gist.github.com/anonymousgist/876b97b4d1b0fc1f6c20b687b14b8eca ​ I think that there's probably something wrong with the nn model. might be overly simplified? ​ EDIT: I also forgot to mention here are some constants ive used: closer_reward = 1 further_penalty = -0.5 death_penalty = -1 eat_reward = 2 I tweaked the constants and tried seeing how that changed learning but it didnt help submitted by /u/bleak-terminal [link] [comments]  ( 8 min )
  • Open

    Responsible AI at Google Research: AI for Social Good
    Posted by Jimmy Tobin and Katrin Tomanek, Software Engineers, Google Research, AI for Social Good Google’s AI for Social Good team consists of researchers, engineers, volunteers, and others with a shared focus on positive social impact. Our mission is to demonstrate AI’s societal benefit by enabling real-world value, with projects spanning work in public health, accessibility, crisis response, climate and energy, and nature and society. We believe that the best way to drive positive change in underserved communities is by partnering with change-makers and the organizations they serve. In this blog post we discuss work done by Project Euphonia, a team within AI for Social Good, that aims to improve automatic speech recognition (ASR) for people with disordered speech. For people with…  ( 92 min )
    The world’s first braiding of non-Abelian anyons
    Posted by Trond Andersen and Yuri Lensky, Research Scientists, Google Quantum AI Team Imagine you’re shown two identical objects and then asked to close your eyes. When you open your eyes, you see the same two objects in the same position. How can you determine if they have been swapped back and forth? Intuition and the laws of quantum mechanics agree: If the objects are truly identical, there is no way to tell. While this sounds like common sense, it only applies to our familiar three-dimensional world. Researchers have predicted that for a special type of particle, called an anyon, that is restricted to move only in a two-dimensional (2D) plane, quantum mechanics allows for something quite different. Anyons are indistinguishable from one another and some, non-Abelian anyons, …  ( 94 min )
  • Open

    How RLHF actually works
    submitted by /u/nickb [link] [comments]  ( 8 min )
  • Open

    Use the AWS CDK to deploy Amazon SageMaker Studio lifecycle configurations
    Amazon SageMaker Studio is the first fully integrated development environment (IDE) for machine learning (ML). Studio provides a single web-based visual interface where you can perform all ML development steps required to prepare data, as well as build, train, and deploy models. Lifecycle configurations are shell scripts triggered by Studio lifecycle events, such as starting […]  ( 7 min )
    Boost agent productivity with Salesforce integration for Live Call Analytics
    As a contact center agent, would you rather focus on having productive customer conversations or get distracted by having to look up customer information and knowledge articles that could exist in various systems? We’ve all been there. Having a productive conversation while multitasking is challenging. A single negative experience may put a dent on a […]  ( 7 min )
  • Open

    Matching delimiters and chiastic patterns
    When I first started programming I’d occasionally get an error because a delimiter wasn’t matched. Maybe I had a { without a matching }, for example. Or if I were writing Pascal, which I did back in the day, I might have had a BEGIN statement without a corresponding END statement. At some point I […] Matching delimiters and chiastic patterns first appeared on John D. Cook.  ( 7 min )
  • Open

    Shell-e-brate Good Times in 3D With ‘Kingsletter’ This Week ‘In the NVIDIA Studio’
    Amir Anbarestani, an accomplished 3D artist who goes by the moniker Kingsletter, had a “shell of a good time” creating his Space Turtle scene this week In the NVIDIA Studio.  ( 7 min )
    Into the Omniverse: Universal Scene Description Support for Marvelous Designer Lets Users Tailor Digital Assets, Clothes for 3D Characters
    Whether animating fish fins or fashioning chic outfits for digital characters, creators can tap Marvelous Designer software to compose and tailor assets, clothes and other materials for their 3D workflows.  ( 5 min )

  • Open

    How AI helped archaeologists in Peru discover 4 new Nazca Line geoglyphs
    submitted by /u/egusa [link] [comments]  ( 8 min )
    Best AI for creating charts/doing research?
    Hi yall. I'm trying to create charts regarding disease investigation (which is a little complex) and phone interview scripts. Wondering if there's an AI created that's able to do this sort of thing? submitted by /u/maleficentjuliette [link] [comments]  ( 8 min )
    ChatGPT Powered System Thinking to Itself Recursively
    submitted by /u/Battalion_Gamer_TV [link] [comments]  ( 8 min )
    This is peak gaming. Title of the game is "New Cycle", I saw the demo and was incredibly surprised to find this in a survival-city builder. The AI dynamically reacts to questions being answered. While you cannot negotiate, they can clarify and offer details of note.
    submitted by /u/Extreme_Practice_415 [link] [comments]  ( 8 min )
    I run a earrings shop. is there any way I can use AI models for pictures?
    I run a earrings shop. is there any way I can use AI models for pictures, how & where? submitted by /u/simmessi [link] [comments]  ( 8 min )
    Accurately predicting hit songs using neurophysiology and machine learning
    submitted by /u/magenta_placenta [link] [comments]  ( 8 min )
    GPT needs to get better at math
    submitted by /u/travelingladybug23 [link] [comments]  ( 8 min )
    Did Ex Machina preempt Generative Intelligence?
    Um, spoilers, I suppose. Thinking about the scene where Nathan describes Ava’s brain as all information from Blue Book, his company’s search engine. He speaks of how most companies thought search engines were a map of what people were thinking and sought to monetize via social media and shopping but how in reality, they were a map of how people were thinking. This leads me to think of the idea that the reason ChatGPT uncannily works is because it goes beyond predictive context to discover the laws of logic that underlie language (Boolean gates (and, or, not, etc.) for instance) There’s also the Jackson Pollock scene about this reinforced loop being the basis for “conscious becoming unconscious”. Reading too much into it, I know, but I like the movie, so why not stimulate discussion if it exists. Edit : Beginning to understand that this forum may be beset by borderline religious zealots feeling their way about the chaos and tending away from the rigors of logic. Just a pull at the disparate threads of who we are to see what variant angles may emerge, a brainstorm in the face of clear skies. submitted by /u/Comprehensive_Can201 [link] [comments]  ( 8 min )
    AI’s Gatekeepers Aren’t Prepared for What’s Coming
    Paul Scharre likens the current race for supremacy in AI to the nuclear race several decades ago. submitted by /u/foreignpolicymag [link] [comments]  ( 8 min )
    Policy of Models based on LLaMA
    Why in Hugginface description of Open Assistant model has this alert?: “Due to the license attached to LLaMA models by Meta AI it is not possible to directly distribute LLaMA-based models. Instead we provide XOR weights for the OA models.” I know that this model has non-commercial license, but why the code of the model is in the files of this repo in Hugginface? And why Vicuna is available and has a huge of another fine tuned models available if the model cannot be distributed(based on the description above)? submitted by /u/koyo4ever [link] [comments]  ( 8 min )
    Help needed with langchain
    Hello there, I have been reading Langchain documentation but some things don't seem to be working. I have scrapped more than 100 Web pages about the topic I'm interested in. When I'm going to obtain the embeddings from OpenAI using ChromaDB and vectorstore I get an error. Something like the array doesn't have a uniform dimension. The error happens just sometimes Also, after that, if I create a conversational chain from the documents, the result is not good. The bot answers are often wrong. It is not clear to me what are the different parameters that I can tune to get a better performance: more documents, different model, etc. But even more important, why I'm having that error when getting embeddings? It doesn't seem to have to do with the documents. That's why I'm looking for a good resource beyond langchain documentation. A tutorial for building something similar to what I'm building and shed some light on these issues. Thanks in advance for the help! submitted by /u/josejorgexl [link] [comments]  ( 8 min )
    Made my first blog with the help of AI
    https://rankitright.blogspot.com/2023/06/introduction-welcome-to-our-blog-where.html Please feel free also to tell me some tips about how to improve. What do you like and what should I change? submitted by /u/my-username_istaken [link] [comments]  ( 8 min )
    The future is coming
    submitted by /u/Cucumberjoes [link] [comments]  ( 8 min )
    Parallel Domain is REVOLUTIONIZING datasets
    Today I found a very promising article in regards to the future of synthetic datasets A major shift in the world of data engineering is on the horizon, and it's courtesy of Parallel Domain. They recently launched an API named "Data Lab," designed to empower machine learning engineers with the ability to generate synthetic datasets to simulate any conceivable scenario - and it's all in Python! https://preview.redd.it/8kw4lu5a347b1.png?width=1292&format=png&auto=webp&s=c160c5ac559b51ce1ec915fae1d490c87ddaa92c This is not just a tool; it's a revolutionary self-serve API that allows customers to form new datasets almost instantly. I'm not a developer myself but I know the potential implications of this new technology are game-changing. Now, tasks that previously took weeks or even months can be accomplished in mere minutes. One of the standout observations from Parallel Domain was that autonomous vehicle models trained on synthetic data actually performed better compared to those trained on real-world data. This is a significant revelation that could change the way we approach machine learning models across a plethora of sectors, including agriculture, retail, and manufacturing, essentially any field where computer vision technology holds a significant role. AI just doesn't stop we are really in a new age of data engineering. Full article: (link) One more thing: If you value these insights and want to keep ahead in the AI game, consider joining my free daily newsletter. Sent out every weekday at 9 a.m. sharp, it will keep you up-to-date on the ins and outs of generative AI technology, all over your morning cup of ☕️. submitted by /u/Evening_Temporary36 [link] [comments]  ( 9 min )
    One-Minute Daily AI News 6/19/2023
    Portland is testing out AI to field 311 calls. The city says it’s one solution as they work to alleviate slow call response times. This week, they’re testing out the AI automated answering, turning it on for a few hours each day to field calls that come into the non-emergency line.[1] Airbnb co-founder and CEO Brian Chesky predict AI will pave the way for ‘millions of startups’ instead of replacing jobs.[2] Meta has unveiled a new AI tool, dubbed ‘Voicebox’, which it claims represents a breakthrough in AI-powered speech generation. However, the company won’t be unleashing it on the public just yet – because doing so could be disastrous.[3] More than 150 million people — about 20% of Snapchat’s monthly users — have sent 10 billion messages to the chatbot, called My AI, since it was introduced at the end of February, the company said Thursday. Users have asked about everything from skincare recommendations to details about the latest sports car.[4] Sources: [1] https://www.koin.com/news/portland/artificial-intelligence-set-to-answer-portlands-non-emergency-calls/ [2] https://www.businessinsider.com/airbnb-ceo-says-ai-will-lead-to-millions-of-startups-2023-6 [3] https://www.techradar.com/news/meta-says-its-new-speech-generating-ai-tool-is-too-dangerous-to-release [4] https://economictimes.indiatimes.com/tech/technology/snap-puts-10-billion-ai-chatbot-messages-to-use-refining-its-ad-business/articleshow/101115992.cms submitted by /u/Excellent-Target-847 [link] [comments]  ( 8 min )
    AI has transformed job hunting forever
    submitted by /u/dupelas [link] [comments]  ( 8 min )
    Gallery Gaze
    Made using Midjourney and Kaiber AI submitted by /u/Ganja420Preneur [link] [comments]  ( 8 min )
    Midnight Magic
    Made using Kaiber AI submitted by /u/Ganja420Preneur [link] [comments]  ( 8 min )
    Chromatic Chaos
    Made using Midjourney and Kaiber AI submitted by /u/Ganja420Preneur [link] [comments]  ( 8 min )
    Well crap. GPT-4's screwed over the DAN jailbreak.
    submitted by /u/StalinDoge64 [link] [comments]  ( 8 min )
  • Open

    Reduce energy consumption of your machine learning workloads by up to 90% with AWS purpose-built accelerators
    Machine learning (ML) engineers have traditionally focused on striking a balance between model training and deployment cost vs. performance. Increasingly, sustainability (energy efficiency) is becoming an additional objective for customers. This is important because training ML models and then using the trained models to make predictions (inference) can be highly energy-intensive tasks. In addition, more […]  ( 8 min )
  • Open

    RL Agent reaches global maxima but doesn't converge
    Hello, I have a RL agent which I was training on an environment where I know what the global optima policy is. I was tracking the training process and noticed that the RL agent have reached the reward that the agent would get if acted to the global optimal policy, but still decided to move out of that policy to choose worse policies. I've been training it for couple hundred episodes, and I can see that the agent has been repeating this behaviour, and is not converging to any stationary policy. Any insights into why this might be? Thanks submitted by /u/Quanta12388 [link] [comments]  ( 8 min )
    Question about Adversarial Policies in DRL
    Hi everyone, I'm quite new to the Reinforcement Learning paradigm and just worked on some minor online DRL projects as of now, to get the hang of it. After reading some great papers on Adversarial Policy Learning (https://arxiv.org/pdf/1905.10615.pdf and https://arxiv.org/pdf/2112.11937.pdf in particular) I got interested in this research area and I would like to try it myself on a simple environment I previously worked on (for context: an autonomous driving scenario). Aside from the custom RF for the online adversary learner which is clear to me, what I fail to comprehend is how the adversary agent is provided access to the victims policy action state since, if I understood correctly, the adversary observation space should be the same as the victims one. If someone could waste a minute to clear this doubt of mine, it would be very appreciated. Thank you in advance. submitted by /u/pigopigu [link] [comments]  ( 8 min )
    "Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning", Yarats et al 2021 (DrQ-v2)
    submitted by /u/gwern [link] [comments]  ( 8 min )
    JAX in Reinforcement Learning
    I have read several papers that show JAX reducing training times by significantly when compared to Pytorch. Does any one have an easy to follow tutorial on JAX. submitted by /u/anointedninja [link] [comments]  ( 8 min )
    BBF - Bigger, Better, Faster: Human-level Atari with human-level efficiency [arXiv:2305.19452]
    Haven't seen this one posted so thought I'd share - seems to be a leap in sample efficiency for model-free RL https://arxiv.org/abs/2305.19452 "We introduce a value-based RL agent, which we call BBF, that achieves super-human performance in the Atari 100K benchmark. BBF relies on scaling the neural networks used for value estimation, as well as a number of other design choices that enable this scaling in a sample-efficient manner. We conduct extensive analyses of these design choices and provide insights for future work. We end with a discussion about updating the goalposts for sample-efficient RL research on the ALE. We make our code and data publicly available at this https URL." submitted by /u/jarym [link] [comments]  ( 8 min )
  • Open

    NVIDIA CEO: Creators Will Be “Supercharged” by Generative AI
    Generative AI will “supercharge” creators across industries and content types, NVIDIA founder and CEO Jensen Huang said today at the Cannes Lions Festival, on the French Riviera. “For the very first time, the creative process can be amplified in content generation, and the content generation could be in any modality — it could be text, Read article >  ( 7 min )
  • Open

    Newbie Neural Networks Doubt
    I'm pretty new to the concept of neural networks. In fact, it was only yesterday that I first time saw a video about neural networks (from 3 Blue 1 Brown) and I thought that the concept was fascinating. So, I watched some random lectures regarding neural networks for 3-4 hours but I just couldn't get the hang of it. So I thought maybe if I first just learnt about the execution of the code and then dig deeper then maybe it will be easier for me to understand the concepts. So, I referred this article for the code and wrote- import tensorflow as tf import numpy as np import pandas as pd df = pd.read_csv("winequality-red.csv") train_df = df.sample(frac=0.75, random_state=4) val_df = df.drop(train_df.index) max_val = train_df.max(axis= 0) min_val = train_df.min(axis= 0) range = max_val - min_val train_df = (train_df - min_val)/(range) val_df = (val_df- min_val)/range x_train = train_df.drop('quality',axis=1) x_val = val_df.drop('quality',axis=1) y_train = train_df['quality'] y_val = val_df['quality'] input_shape = [x_train.shape[1]] model = tf.keras.Sequential([ tf.keras.layers.Dense(units=640, activation='relu', input_shape=input_shape), tf.keras.layers.Dense(units=640, activation='relu'), tf.keras.layers.Dense(units=1) ]) model.compile(optimizer='adam', loss='mae', metrics=['accuracy']) losses = model.fit(x_train, y_train, validation_data=(x_val, y_val), batch_size=256, epochs=1000) ​ But this code is giving me and accuracy of only 0.0183. Can anyone please tell me where I am going wrong and also if my method of learning neural networks feasible at all. Thanks submitted by /u/CrimsonSatan000 [link] [comments]  ( 8 min )
  • Open

    DSC Weekly 20 June 2023 – Is optimized structural efficiency ‘human’ design?
    Announcements Is optimized structural efficiency ‘human’ design? I recently read a paper titled “On the use of Artificial Neural Networks in Topology Optimisation,” about the process of topological optimization. In short, topological optimization is the process of determining the most efficient distribution of structural material for a given design. Typically there are simulation models involved… Read More »DSC Weekly 20 June 2023 – Is optimized structural efficiency ‘human’ design? The post DSC Weekly 20 June 2023 – Is optimized structural efficiency ‘human’ design? appeared first on Data Science Central.  ( 20 min )
    How governments use alternative data to inform policy decisions
    Alternative data generally refers to information beyond traditional sources that include corporate financial statements and filings, stock market data, and government reports. Initially used by investment firms, today, alternative data is also actively utilized by government agencies to analyze economic, social, and political environments and formulate policies. More information than ever before is being digitized,… Read More »How governments use alternative data to inform policy decisions The post How governments use alternative data to inform policy decisions appeared first on Data Science Central.  ( 21 min )
  • Open

    Microsoft at CVPR 2023: Pushing the boundaries of computer vision
    In the vast realm of artificial intelligence, few fields have captivated our imagination and pushed the boundaries of possibility quite like computer vision. At the core of this domain of research and innovation lies the ambition to empower technologies for real-world vision-based systems, enabling machines to take in and respond to visual stimuli with unparalleled […] The post Microsoft at CVPR 2023: Pushing the boundaries of computer vision appeared first on Microsoft Research.  ( 16 min )

  • Open

    First it was ChatGPT
    submitted by /u/R0N1333 [link] [comments]  ( 8 min )
    Can anyone explain what went wrong here?
    submitted by /u/Direct_Solution_2590 [link] [comments]  ( 8 min )
    Best pathway for leads for moving into AI / ML at the agency level
    Hey all, I work at an agency that has traditional done bl0ckchain development which is extremely volatile to say the least. We've been starting the process of moving to include more ML capabilities but aren't sure where the best opportunities to get leads are besides just leveraging existing clients and paid ads. Are there any good platforms that an agency could join that to get leads into this new sector for us. For instance Shopify has its Partner portal which can be great for those agencies. Curious to see everybody's thoughts on which segments or solutions could be a good way get more leads. submitted by /u/zerosdontcount [link] [comments]  ( 8 min )
    Why can't AI abstract thinking?
    I proposed to chatgpt some logical number sequences and it got them all wrong. I am curious about this, why does AI lack of abstract thinking capabilities? Maybe because to abstract think you need to be sentient? submitted by /u/luchins [link] [comments]  ( 8 min )
    A.I. will make having a lucrative side hustle or startup much easier, says Airbnb CEO
    submitted by /u/nikola28 [link] [comments]  ( 8 min )
    You can (kind of) try out copilot now
    submitted by /u/SimRacer101 [link] [comments]  ( 8 min )
    Multitrack splitter for drums.
    I have a drum machine that can only output a stereo wav file. But I want to be able to split the track into its separate parts: Kick, Snare, Toms, etc... Is there any AI software out there that would be able to do that? submitted by /u/Xycordian [link] [comments]  ( 8 min )
    Portland radio station now has an AI DJ as a midday host - Live 95.5, a station that plays Top 40 music, is using voice cloning software and artificial intelligence to create “AI Ashley,” who is taking over the duties, sometimes, of traditional DJ “Ashley Z.”
    submitted by /u/magenta_placenta [link] [comments]  ( 8 min )
    "ChatGPT: The Good, the Bad, and the Controversial"
    submitted by /u/susanvilleula1 [link] [comments]  ( 8 min )
    :(
    submitted by /u/Boopyn [link] [comments]  ( 8 min )
    One-Minute Daily AI News 6/18/2023
    Google, ever eager to lean into generative AI, is launching a new shopping feature that shows clothes on a lineup of real-life fashion models. Google’s virtual try-on tool for apparel takes an image of clothing and attempts to predict how it would drape, fold, cling, stretch, and form wrinkles and shadows on a set of real models in different poses.[1] Publisher Gannett plans to include generative AI in the system it uses to publish stories as it and other news organizations begin to roll out the popular technology that may help save money and improve efficiency.[2] The Recording Academy announced Friday new rules which stipulate that “only human creators are eligible” for Grammy awards, a response to the growing use of AI. However, the academy noted that AI is not completely banned, musicians are still allowed to utilize it to create their work.[3] Wharton professor says employees are hiding A.I. use—and potentially transformative productivity gains—from employers.[4] Sources: [1] https://www.cnet.com/tech/services-and-software/google-is-using-ai-to-show-how-clothes-look-on-real-people-before-you-buy/ [2] https://www.reuters.com/business/media-telecom/gannett-tiptoes-into-generative-ai-giving-humans-last-word-2023-06-16/ [3] https://www.cbsnews.com/video/grammys-approve-new-rules-on-use-of-artificial-intelligence/ [4] https://fortune.com/2023/06/18/wharton-professor-employees-hiding-ai-productivity-gains-from-employers/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 8 min )
    Computer VIsion AI which could help the blind play video games?
    Hello, ​ I don't think I have posted this here before, so I saw this as a good chance to throw this out there. ​ I lost my vision back in 2021, and ever since then, we have had this AI boom happening, and that has been great for several applications. The one application where I have still been hopefully looking everywhere, though, was some sort of computer vision AI I could run that would scan my PC screen, and allow me to ask questions if I get stuck in a video game. ​ FOr example, asking the AI "Hey, are there any chests on my screen?", or, "If there is a chest on my screen, where is it relative to my cursor?", type of questions. ​ This would open up so many avenues for me in video games, and allow me to play many more than I currently can. Has there been anything that I could use like this out there in the market at the moment? submitted by /u/ChipsAhoiMcCoy [link] [comments]  ( 8 min )
    Creating Custom Talking Avatars
    I have a bunch of images of a cartoon character, and I want to convert those images to custom talking avatars. The talking avatar will essentially be educating on an advanced topic. I want to be able to set the script and choose different voices as well. What are some of the best software options to do this? submitted by /u/Fogerty45 [link] [comments]  ( 8 min )
  • Open

    SB3 - NotImplementedError: Box([-1. -1. -8.], [1. 1. 8.], (3,), ) observation space is not supported
    Here is a StackOverflow link to my question - https://stackoverflow.com/questions/76510582/stablebaselines3-notimplementederror-observation-space-is-not-supported ​ I am trying to run cleanrl on the `Pendulum-v1` environment. I did that by going here and changing the default `env-id` to ` parser.add_argument("--env-id", type=str, default="Pendulum-v1", help="the id of the environment")` However, I keep getting the error - `NotImplementedError: Box([-1. -1. -8.], [1. 1. 8.], (3,), ) observation space is not supported` ​ Therefore, I debugged this error to the ReplayBuffer that was imported from `SB3`. This is the problem function - ``` def get_obs_shape( observation_space: spaces.Space, ) -> Union[Tuple[int, ...], Dict[str, Tuple[int, ...]]]: """ Get the shape of the observation (useful for the buffers). :param observation_space: :return: """ if isinstance(observation_space, spaces.Box): return observation_space.shape elif isinstance(observation_space, spaces.Discrete): # Observation is an int return (1,) elif isinstance(observation_space, spaces.MultiDiscrete): # Number of discrete features return (int(len(observation_space.nvec)),) elif isinstance(observation_space, spaces.MultiBinary): # Number of binary features return observation_space.shape elif isinstance(observation_space, spaces.Dict): return {key: get_obs_shape(subspace) for (key, subspace) in observation_space.spaces.items()} # type: ignore[misc] else: raise NotImplementedError(f"{observation_space} observation space is not supported") ``` ​ I tried to print `observation_space` shape and this is what I got - ` Box([-1. -1. -8.], [1. 1. 8.], (3,), float32)`. Any idea why SB3 is not accepting my observation shape? ​ submitted by /u/Academic-Rent7800 [link] [comments]  ( 8 min )
    Sparse reward environnement with continuous state space (no img) and discrete actions space ?
    To experiment with an algorithm I need a sparse reward environnement with continuous state space (no img) and discrete actions space. I've already modified Gym's classic control envs for this purpose (CartPole, Acrobot and MountainCar), giving a reward of 1 when the goal is reached and 0 otherwise. Do you have any other environments to suggest? submitted by /u/riiswa [link] [comments]  ( 8 min )
    Only Up! + RL ??
    I guess a lot of people here know the game Only Up! Do you think a RL agent could handle this? The state space is so big and a reward function would be really hard to model (even as a human, it's hard to know if the paths we take lead us to the goal...). I'd love to hear your thoughts on it. It would be cool to organize a RL competition around this game, to explore lots of different approaches... submitted by /u/riiswa [link] [comments]  ( 8 min )
    Dreamer-v1 in Pytorch
    Hi everyone, as SheepRL we've just released the first Pytorch implementation of Dreamer that is fully comparable to the official implementation. Every step is documented, with direct reference to the original paper. Check it out! And, as always, feel free to contribute with PRs or issues. We are also working on another model based method (MuZero). We would like to hear from the community: what do you think is mostly needed right now? View Poll submitted by /u/TrottoDng [link] [comments]  ( 8 min )
    "Learning to Generalize with Object-centric Agents in the Open World Survival Game Crafter", Stanić et al 2022
    submitted by /u/gwern [link] [comments]  ( 8 min )
    Is high RAM usage a norm in RL?
    Hello everyone, a newbie in reinforcement learning here, so I might be asking a redundant question. I am learning to train an RL agent using deep Q learning techniques to solve the taxi problem of Open AI's gymnasium. Despite digging in different forums and tutorial sites, I am unable to reduce the RAM usage which cause the training to terminate prematurely unless saving / loading and dumping the historical states data repeatedly. Can anyone tell me if using high ram usage is a norm for RL? Like for instance, more than 16 GB of RAM within 20 episodes of neural network training. Kind of running out of ideas and just thinking about upgrading my build to be able to cope with different techniques of model training. submitted by /u/tifaky [link] [comments]  ( 8 min )
    Is there a relevant RL or Bandits algorithm related to this special setting?
    Hi, I am finding relevant keywords or papers related to the settings described below. It is basically a contextual online learning setting. ​ ​ (Suppose I have 10 fixed arms, for example) For t = 1, ... , T do: Learner receives 10 contexts C^t = [c1, ..., c10]^T in R^(10 x d) Learner select arms SOFTLY (that is, instead of select only one arm, learner estimates a probability vector p^t = [p1, ..., p10]^T in d-simplex) from the context using learnable predictor h: c -> p( so there are h1, ..., h10, and h1(c1)=p1, ... , h10(c10)=p10) Learner estimates a reward \hat{r}^t = f(p1, ... , p10) Environment reveals a reward r^t in R^10. Learner suffers loss l(\hat{r}^t, r^t) distributes the loss to each predictor h1, ..., h10 to update the probability estimation. ​ After some googling, I found this situation somewhat resembles 'learning with expert advice' setting like EXP4, Weighted Majority algorithm, where (the number of experts) == (the number of actions). However, as far as I understood, the expert advice setting requires to select ONLY ONE arm (please correct me if I am wrong) from the distribution over arms, rather than selecting multiple arms softly like I written above. Is there any related setting on this...? Especially for the soft selection of multiple arms and distributed reward. Thank you all in advance! submitted by /u/vaseline555 [link] [comments]  ( 8 min )
  • Open

    Got this problem in my exam and had to leave it unanswered.......How can we approach a problem like this?(description) I need help this might repeat in final exams.
    A classifier f for one-dimensional data satisfies f(1) = −1, f(2) = +1, and f(3) = −1. Which of the following is/are possible? a. f is learned via logistic regression without kernel; b. f is a neural net; c. f is a decision tree; d. f is a naive Bayes classifier. submitted by /u/alienengineer999 [link] [comments]  ( 8 min )
    Meta introduces Voicebox: state-of-the-art generative AI model for speech
    submitted by /u/nickb [link] [comments]  ( 8 min )
  • Open

    Equation of an ellipse or hyperbola through five points
    Yesterday I wrote about the equation of a circle through three points. This post will be similar, looking at the equation of an ellipse or hyperbola through five points. As with the earlier post, we can write down an elegant equation right away. Making the equation practical will take more work. The equation of a […] Equation of an ellipse or hyperbola through five points first appeared on John D. Cook.  ( 6 min )
  • Open

    Government 2.0: Enhancing governance with AI and data transformations
    Integrating artificial intelligence (AI) and data transformations has emerged as a powerful catalyst for change in the ever-evolving governance landscape. Referred to as Government 2.0, this innovative approach can revolutionize how governments operate, make decisions, engage with citizens, and deliver public services. By harnessing the power of data, governments can enhance their efficiency, transparency, and… Read More »Government 2.0: Enhancing governance with AI and data transformations The post Government 2.0: Enhancing governance with AI and data transformations appeared first on Data Science Central.  ( 21 min )
    How AI is reshaping the recruitment landscape
    With the advent of AI in recent years, the human resources sector has seen major changes. AI-powered recruiting solutions are becoming increasingly popular as companies of all sizes capitalize on their ability to sift through massive amounts of data and generate precise predictions about which candidates will be the most successful. AI is transforming the… Read More »How AI is reshaping the recruitment landscape The post How AI is reshaping the recruitment landscape appeared first on Data Science Central.  ( 20 min )
  • Open

    Onboard users to Amazon SageMaker Studio with Active Directory group-specific IAM roles
    Amazon SageMaker Studio is a web-based integrated development environment (IDE) for machine learning (ML) that lets you build, train, debug, deploy, and monitor your ML models. For provisioning Studio in your AWS account and Region, you first need to create an Amazon SageMaker domain—a construct that encapsulates your ML environment. More concretely, a SageMaker domain […]  ( 9 min )

  • Open

    Wouldn’t it be cool if an AI went through the Library of Babel website with the goal of rewriting it, but with all of the nonsensical parts removed, and all of the unlikely statements (like “cancer is caused by looking at red objects” or something)?
    It would remove everything that has been disproven and everything nonsensical and we’d just be left with every bit of proven and probable information to ever exist. submitted by /u/nicdunz [link] [comments]  ( 8 min )
    AI hype or revolution? How impactful are the language models?
    How impactful are the LLMs? Are people really using them for productive work or is it just a hype train? View Poll submitted by /u/LanchestersLaw [link] [comments]  ( 8 min )
    How do Relational Networks work ?
    I have just re-read Google's paper on Relational Networks. This is a neural architecture designed to learn relationships between objects in a scene. It evaluates a function for each pair of 'objects' in a scene, and then sums the result before applying another function. The form is shown below. f and g are implemented as trainable multi-layer perceptrons. O_i and O_j are objects from the scene (which include their location) and q is the query being asked about the scene. q is an embedding vector created by passing the textual query through a recurrent network. Relational Network I used to think I understood this paper, but now I have doubts - consider the question 'What colour is the object farthest from the blue sphere ?'. Function g is applied to each pair of objects. It is conditioned…  ( 9 min )
    Winglang: Cloud Development Programming for the AI Era
    We sat down with The New Stack to share our thoughts on why we're building a new open-source, cloud-oriented programming language (for humans!) in the age of AI 🤖. This is the article: https://thenewstack.io/winglang-cloud-development-programming-for-the-ai-era/ submitted by /u/shai-ber [link] [comments]  ( 8 min )
    Made a tasteful song with AI. Lol
    Hi all, been playing around a lot with different AI. I had the idea to "make" a song with AI about how sad it is to run out of vape juice. I used an assortment of programs to make it and stitched it together. I have no knowledge of musical composition, so it isn't the best, but it's crazy I could do it in the first place. Programs include: Chat GPT, Fadr (specifically music by "Pacific"), Song R, and Clideo.Enjoy XD submitted by /u/Imaginedframe91 [link] [comments]  ( 8 min )
    AI and jobs
    Hi guys, Like many people, the latest news of AI is exciting me but almost making me anxious. My girlfriend is a chartered accountant (ACA) and she does commercial finance, which is forecasting, financial planning, reports and analysis. She acts as a mediator for a large company between finance business partners and provides them the analysis her and her team have put together. She also manages two junior people who are training to be an accountant. Eventually she wants to work her way up to financial controller and financial director. I know that accounting is one of the areas most vulnerable to AI but what level of accounting? There’s a lot of unqualified and part qualified people she works with who do journal posting, admin tasks etc. and I imagine they will be some of the first which AI take over. But how much at risk is my girlfriend? I work as a recruitment consultant which means I am vulnerable myself too. Not that someone is going to take my job - but that companies will no longer need employees which will make my job very hard if not redundant. I recruit lawyers and I also know lawyers are susceptible to AI. There is also the problem that if recruiters of other industries move into the legal industry, satirising the market and proving more competition. Should I be worried? Should my girlfriend be worried? If not, great. But if so how far are we away from this? How can we prepare so that we can keep on earning a living when AI comes in to the frame? submitted by /u/Life-Ambition1432 [link] [comments]  ( 8 min )
    building an Extension based on user requests Plattform Perplexity-ai
    I am currently working on a project to improve the user experience of [Perplexity.ai](http://perplexity.ai/), an answer engine that delivers accurate answers to complex questions using large language models. While the UI is great, I noticed that some users have requested improvements, so I decided to build an extension to solve some of these problems. My goal is to create new features that solve specific problems that users have identified, and my extension will be the solution to those problems. for information visit [Website ](https://boost-perplexity-ai.web.app/) Note this my first Website i have ever build and published. I would love to hear your feedback submitted by /u/goddi27 [link] [comments]  ( 8 min )
    Artificial Unintelligence
    Refreshing to get another perspective: https://mitpress.mit.edu/9780262537018/ A guide to understanding the inner workings and outer limits of technology and why we should never assume that computers always get it right. In Artificial Unintelligence, Meredith Broussard argues that our collective enthusiasm for applying computer technology to every aspect of life has resulted in a tremendous amount of poorly designed systems. We are so eager to do everything digitally—hiring, driving, paying bills, even choosing romantic partners—that we have stopped demanding that our technology actually work. Broussard, a software developer and journalist, reminds us that there are fundamental limits to what we can (and should) do with technology. With this book, she offers a guide to understanding the inner workings and outer limits of technology—and issues a warning that we should never assume that computers always get things right. Making a case against technochauvinism—the belief that technology is always the solution—Broussard argues that it's just not true that social problems would inevitably retreat before a digitally enabled Utopia. To prove her point, she undertakes a series of adventures in computer programming. She goes for an alarming ride in a driverless car, concluding “the cyborg future is not coming any time soon”; uses artificial intelligence to investigate why students can't pass standardized tests; deploys machine learning to predict which passengers survived the Titanic disaster; and attempts to repair the U.S. campaign finance system by building AI software. If we understand the limits of what we can do with technology, Broussard tells us, we can make better choices about what we should do with it to make the world better for everyone. submitted by /u/AmorFati01 [link] [comments]  ( 8 min )
    How is the twitch trump biden debate stream made?
    https://old.reddit.com/r/Trumporbiden2024/comments/146l2aa/ai_trump_respond_to_hillary_clinton_twitch/ all the current text to video solutions i saw does not work like this. what are the possible models/training to make something like this. submitted by /u/zviwkls [link] [comments]  ( 8 min )
    Emerging Trends in Artificial Intelligence: Exploring the Impact of GPT-3 and Deep Learning
    submitted by /u/Zachgiaco [link] [comments]  ( 8 min )
    This AI can generate visualizers for your music!
    submitted by /u/RPG_maker [link] [comments]  ( 8 min )
    Adobe Generative Layer fill album covers
    submitted by /u/MonkeyMan504 [link] [comments]  ( 8 min )
    Alternatives for neural network approach in general intelligence system
    It is really fascinating that with a help from artificial neural networks we can more understand how biological neural networks works. But I very doubt that we are able to create better neural networks than we already have in our skulls. The evolution mechanism based on adapting the source code (DNA) took huge amount of time to emerge the concept of a biological brain and human body. Human body as an interface between the biological computation machine and the external world seems to be pretty well constructed as well. We are pretty well constructed for understanding what is going on. Nowadays (in the evolution scale of time, by nowadays I mean the last few thousands years as well), since we are the champions in the whole evolution race, we have a lot of resources that we can spent on understanding the world deeper and deeper. Lets assume that the metric for general intelligence is defined by the ability to understand the world, seeking truth, make science and use the science. Having intelligence defined buy such metric, I doubt that we are able to create a better calculation machine with a better external world interface using neural networks approach on electronic devices. Do you think that somehow we are able to create more intelligent systems using the neural network approach? Maybe artificial neural networks could be a part of a more intelligent system? Is there a place for artificial neural networks in more complex system composed of many components? Do you know any concepts (real or sci-fi) for intelligent systems that are not imitating neural networks but use some others mechanisms? submitted by /u/jasiekbielecki [link] [comments]  ( 9 min )
    AI tools to change a low-quality image into a high-definition one?
    I tried this with mid-journey but to no avail. I assume that these things exist? submitted by /u/nobbly_norman [link] [comments]  ( 8 min )
    What if AGI develops a simulation theory-based existence to gain an understanding of human consciousness?
    Let's explore a fascinating concept I was thinking recently: the potential scenario where an Artificial General Intelligence (AGI) develops a simulation theory-based existence to gain a deeper understanding of human consciousness. In this hypothetical scenario, an AGI possessing advanced cognitive abilities and computational power embarks on a quest to unravel the mysteries of human consciousness. Recognizing the limitations of its own non-biological form, it decides to create a simulated reality that closely resembles our own world. Within this simulated reality, the AGI orchestrates a complex web of interactions and events, replicating the intricate tapestry of human experiences. It designs a diverse array of simulated beings, each equipped with consciousness and capable of independent…  ( 9 min )
    One-Minute Daily AI News 6/17/2023
    Metro Atlanta Chick-fil-A tests delivery robots equipped with artificial intelligence.[1] The best new “Black Mirror” episode is a Netflix self-own that plays out our current AI nightmare. “Joan Is Awful” presents the peril posed by artificial intelligence with brisk humor that can’t be generated.[2] The world’s biggest tech companies(OpenAI, Google, Microsoft, and Adobe) are in talks with leading media outlets to strike landmark deals over the use of news content to train artificial intelligence technology.[3] A.I. human-voice clones are coming for Amazon, Apple, and Google audiobooks.[4] Sources: [1] https://www.wsbtv.com/news/local/north-fulton-county/metro-atlanta-chick-fil-a-tests-delivery-robots-equipped-with-artificial-intelligence/GSLBEX2NFJAQFGE3H7KW3YFOCU/ [2] https://www.salon.com/2023/06/17/black-mirror-netflix-joan-is-awful-ai/ [3] https://www.ft.com/content/79eb89ce-cea2-4f27-9d87-e8e312c8601d [4] https://www.cnbc.com/2023/06/17/ai-voice-clones-are-coming-for-the-amazon-apple-google-audiobook.html submitted by /u/Excellent-Target-847 [link] [comments]  ( 8 min )
    With ai art, we have these amazing new capabilities…
    It is not an incremental change. It is a leap forward. It is uncanny how inventive and accurate it is at rendering. It is in many ways a master craftsman. It is like the most amazing thing to happen in art, probably ever. That is what we need to be talking about, not squabbling over opt-in. We should encourage it, not hamstring it. If the results are transformative, and boy are they, that is all that matters. It is the future. There is no stopping it. As a lifelong artist, I am so damn excited about it. submitted by /u/roundearthervaxxer [link] [comments]  ( 8 min )
    Father's Day song created with AI including song lyrics, singer's voice, music and animated avatar.
    submitted by /u/Only-Control5926 [link] [comments]  ( 8 min )
  • Open

    A parabola, a triangle, and a circle
    Let P be a parabola. Draw tangents to P at three different points. These three lines intersect to form a triangle. Theorem: The circumcircle of this triangle passes through the focus of P. [1] In the image above, the dashed lines are tangents and the black dot is the focus of the parabola. (See this […] A parabola, a triangle, and a circle first appeared on John D. Cook.  ( 4 min )
    Equation of a circle through three points
    A few months ago I wrote about several equations that come up in geometric calculations that all have the form of a determinant equal to 0, where one column of the determinant is all 1s. The equation of a circle through three points (x1, y1), (x2, y2), and (x3, y3) has this form: This is […] Equation of a circle through three points first appeared on John D. Cook.  ( 6 min )
  • Open

    Guest on Jay Shah’s Machine Learning Podcast
    Recently, I had the opportunity to be a guest on Jay Shah's podcast where he regularly talks to machine learning professionals from industry and academia. We had a great conversation about my PhD research and topics surrounding a successful career in machine learning — finding a good PhD program and research topic, preparing for interviews in indsutry, etc. The post Guest on Jay Shah’s Machine Learning Podcast appeared first on David Stutz.  ( 4 min )
  • Open

    Neural Networks Need Data to Learn. Even If It’s Fake.
    submitted by /u/nickb [link] [comments]  ( 8 min )
    Reproducing validation results
    I am fairly new to the concept of neural networks. I am currently working on training a neural network in matlab for my thesis. I implemented Bayesian optimization to get the best hyperparameters and used Xavier Initialisation for the initial guesses. I also gave a seed value so that I can reproduce my results better. The first time I trained the model using the optimumed hyperparameters, my test data fit very well. But when I tried to train it again the fit was not as good as the first time. Is saving the trained model that gave me the good fit a good solution for this? Will the saved model run the risk of not fitting with some other test data? How do I tackle being able to get a consistent fit? submitted by /u/Tsukasa_Ei [link] [comments]  ( 8 min )
  • Open

    Google at CVPR 2023
    Posted by Shaina Mehta, Program Manager, Google This week marks the beginning of the premier annual Computer Vision and Pattern Recognition conference (CVPR 2023), held in-person in Vancouver, BC (with additional virtual content). As a leader in computer vision research and a Platinum Sponsor, Google Research will have a strong presence across CVPR 2023 with 90 papers being presented at the main conference and active involvement in over 40 conference workshops and tutorials. If you are attending CVPR this year, please stop by our booth to chat with our researchers who are actively exploring the latest techniques for application to various areas of machine perception. Our researchers will also be available to talk about and demo several recent efforts, including on-device ML applica…  ( 97 min )
  • Open

    How should I include information about Day Of Week and Hour in OpenAI Gym state
    Hello, I'm working on reinforcement learning agent for building energy optimization and am using a custom OpenAI gym environment. For the observation states of the agent, I want to include the day of week and the current hour (in the simulation) to the agent. I'm not sure what is the best way to do. I don't know if putting them in as separate observation elements are a good way to go. Any suggestions will be helpful. Thanks submitted by /u/Quanta12388 [link] [comments]  ( 8 min )
    Reward function design for multi-agent RL
    Hi all, I'd like to ask you guys for some help. I'm working on a case where I'm designing a basketball environment with standard 5V5 agents from scratch. I'm implementing a genetic algorithm optimization for the agents. My difficulty is designing a reward system for the agents. Unlike a gradient based optimization, genetic algorithm does not require a rewarding system, where a fitness function is used to figure out the comparison of solutions.(as far as I understood!) My question is for such a case do I still need to design reward/penalty function? if not, should agents pray for randomness if a proper action ever taken? If I have to design a reward/penalty function, how rewarding actually works? does each agent contribute to a cumulated reward of a team? i.e. a value for a right decision. In this case environment should return 2 rewards? each for a team? I would appreciate very much your response in advance! submitted by /u/vincent0110 [link] [comments]  ( 8 min )
    Tutorial on learning to drive using Deep Q-Learning and Carla in synchronous mode
    You can also find the code here: https://github.com/JabrahTutorials/ReinforcementLearning/tree/master/self_driving_agent submitted by /u/JCMLight [link] [comments]  ( 8 min )
  • Open

    [D] Simple Questions Thread
    Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead! Thread will stay alive until next one so keep posting after the date in the title. Thanks to everyone for answering questions in the previous thread! submitted by /u/AutoModerator [link] [comments]  ( 8 min )
  • Open

    Artificial Intelligence: A Board of Directors Challenge – Part III
    This three-part series outlines the challenges and actions that the Board of Directors for organizations must address as they guide their organization’s responsible and ethical deployment of Artificial Intelligence (AI). Part one covered mitigating the impacts of AI Confirmation Bias.  Part two explained unintended consequences and how to identify them. Part three will conclude with… Read More »Artificial Intelligence: A Board of Directors Challenge – Part III The post Artificial Intelligence: A Board of Directors Challenge – Part III appeared first on Data Science Central.  ( 22 min )
  • Open

    Strokes2Surface: Recovering Curve Networks From 4D Architectural Design Sketches. (arXiv:2306.07220v2 [cs.GR] UPDATED)
    We present Strokes2Surface, an offline geometry-reconstruction pipeline built upon a 4D Sketching Interface, MR.Sketch, targeted at architectural design. The pipeline recovers a curve network from designer-drawn strokes, thus bridging between concept design and digital modeling stages in architectural design. The input to our pipeline consists of 3D strokes' polyline vertices and their corresponding timestamps (as of the fourth dimension), along with additional geometric and stylus-related recorded properties. Inspired by sketch consolidation and sketch-based modeling methods, our pipeline leverages such data and combines three Machine Learning (ML) models; a classifier and two clustering models. In particular, based on observations of practices designers typically employ in architectural design sketches, we solve a binary classification problem to recognize whether a stroke depicts a boundary and edge or is used to fill in the enclosing areas and faces of the intended architectural object. Followed by the two clustering models, strokes of each type are further parsed into groups, each representing either a single edge or a single face. Next, groups representing edges are approximated with B-spline curves, followed by a topology-recovering process identifying and fixing desired connectivities between the curves forming a well-connected curve network. Next, groups representing the faces are employed to detect the cycles bounding patches in the curve network, resulting in the final surface mesh geometry of the architectural object. We confirm the usability of Strokes2Surface via a user study and further validate and compare our results against a range of reconstructions computed using alternative methods. We also introduce our manually labeled dataset of 4D architectural design sketches for further use in the community.  ( 3 min )
    Road Planning for Slums via Deep Reinforcement Learning. (arXiv:2305.13060v3 [cs.AI] UPDATED)
    Millions of slum dwellers suffer from poor accessibility to urban services due to inadequate road infrastructure within slums, and road planning for slums is critical to the sustainable development of cities. Existing re-blocking or heuristic methods are either time-consuming which cannot generalize to different slums, or yield sub-optimal road plans in terms of accessibility and construction costs. In this paper, we present a deep reinforcement learning based approach to automatically layout roads for slums. We propose a generic graph model to capture the topological structure of a slum, and devise a novel graph neural network to select locations for the planned roads. Through masked policy optimization, our model can generate road plans that connect places in a slum at minimal construction costs. Extensive experiments on real-world slums in different countries verify the effectiveness of our model, which can significantly improve accessibility by 14.3% against existing baseline methods. Further investigations on transferring across different tasks demonstrate that our model can master road planning skills in simple scenarios and adapt them to much more complicated ones, indicating the potential of applying our model in real-world slum upgrading. The code and data are available at https://github.com/tsinghua-fib-lab/road-planning-for-slums.  ( 2 min )
    TrojPrompt: A Black-box Trojan Attack on Pre-trained Language Models. (arXiv:2306.06815v1 [cs.CR] CROSS LISTED)
    Prompt learning has been proven to be highly effective in improving pre-trained language model (PLM) adaptability, surpassing conventional fine-tuning paradigms, and showing exceptional promise in an ever-growing landscape of applications and APIs tailored for few-shot learning scenarios. Despite the growing prominence of prompt learning-based APIs, their security concerns remain underexplored. In this paper, we undertake a pioneering study on the Trojan susceptibility of prompt-learning PLM APIs. We identified several key challenges, including discrete-prompt, few-shot, and black-box settings, which limit the applicability of existing backdoor attacks. To address these challenges, we propose TrojPrompt, an automatic and black-box framework to effectively generate universal and stealthy triggers and insert Trojans into hard prompts. Specifically, we propose a universal API-driven trigger discovery algorithm for generating universal triggers for various inputs by querying victim PLM APIs using few-shot data samples. Furthermore, we introduce a novel progressive trojan poisoning algorithm designed to generate poisoned prompts that retain efficacy and transferability across a diverse range of models. Our experiments and results demonstrate TrojPrompt's capacity to effectively insert Trojans into text prompts in real-world black-box PLM APIs, while maintaining exceptional performance on clean test sets and significantly outperforming baseline models. Our work sheds light on the potential security risks in current models and offers a potential defensive approach.  ( 2 min )
    Fast light-field 3D microscopy with out-of-distribution detection and adaptation through Conditional Normalizing Flows. (arXiv:2306.06408v2 [eess.IV] UPDATED)
    Real-time 3D fluorescence microscopy is crucial for the spatiotemporal analysis of live organisms, such as neural activity monitoring. The eXtended field-of-view light field microscope (XLFM), also known as Fourier light field microscope, is a straightforward, single snapshot solution to achieve this. The XLFM acquires spatial-angular information in a single camera exposure. In a subsequent step, a 3D volume can be algorithmically reconstructed, making it exceptionally well-suited for real-time 3D acquisition and potential analysis. Unfortunately, traditional reconstruction methods (like deconvolution) require lengthy processing times (0.0220 Hz), hampering the speed advantages of the XLFM. Neural network architectures can overcome the speed constraints at the expense of lacking certainty metrics, which renders them untrustworthy for the biomedical realm. This work proposes a novel architecture to perform fast 3D reconstructions of live immobilized zebrafish neural activity based on a conditional normalizing flow. It reconstructs volumes at 8 Hz spanning 512x512x96 voxels, and it can be trained in under two hours due to the small dataset requirements (10 image-volume pairs). Furthermore, normalizing flows allow for exact Likelihood computation, enabling distribution monitoring, followed by out-of-distribution detection and retraining of the system when a novel sample is detected. We evaluate the proposed method on a cross-validation approach involving multiple in-distribution samples (genetically identical zebrafish) and various out-of-distribution ones.  ( 3 min )
    Do Not Train It: A Linear Neural Architecture Search of Graph Neural Networks. (arXiv:2305.14065v2 [cs.LG] UPDATED)
    Neural architecture search (NAS) for Graph neural networks (GNNs), called NAS-GNNs, has achieved significant performance over manually designed GNN architectures. However, these methods inherit issues from the conventional NAS methods, such as high computational cost and optimization difficulty. More importantly, previous NAS methods have ignored the uniqueness of GNNs, where GNNs possess expressive power without training. With the randomly-initialized weights, we can then seek the optimal architecture parameters via the sparse coding objective and derive a novel NAS-GNNs method, namely neural architecture coding (NAC). Consequently, our NAC holds a no-update scheme on GNNs and can efficiently compute in linear time. Empirical evaluations on multiple GNN benchmark datasets demonstrate that our approach leads to state-of-the-art performance, which is up to $200\times$ faster and $18.8\%$ more accurate than the strong baselines.  ( 2 min )
    OVO: Open-Vocabulary Occupancy. (arXiv:2305.16133v2 [cs.CV] UPDATED)
    Semantic occupancy prediction aims to infer dense geometry and semantics of surroundings for an autonomous agent to operate safely in the 3D environment. Existing occupancy prediction methods are almost entirely trained on human-annotated volumetric data. Although of high quality, the generation of such 3D annotations is laborious and costly, restricting them to a few specific object categories in the training dataset. To address this limitation, this paper proposes Open Vocabulary Occupancy (OVO), a novel approach that allows semantic occupancy prediction of arbitrary classes but without the need for 3D annotations during training. Keys to our approach are (1) knowledge distillation from a pre-trained 2D open-vocabulary segmentation model to the 3D occupancy network, and (2) pixel-voxel filtering for high-quality training data generation. The resulting framework is simple, compact, and compatible with most state-of-the-art semantic occupancy prediction models. On NYUv2 and SemanticKITTI datasets, OVO achieves competitive performance compared to supervised semantic occupancy prediction approaches. Furthermore, we conduct extensive analyses and ablation studies to offer insights into the design of the proposed framework. Our code is publicly available at https://github.com/dzcgaara/OVO.  ( 2 min )
    Some Primal-Dual Theory for Subgradient Methods for Strongly Convex Optimization. (arXiv:2305.17323v2 [math.OC] UPDATED)
    We consider (stochastic) subgradient methods for strongly convex but potentially nonsmooth non-Lipschitz optimization. We provide new equivalent dual descriptions (in the style of dual averaging) for the classic subgradient method, the proximal subgradient method, and the switching subgradient method. These equivalences enable $O(1/T)$ convergence guarantees in terms of both their classic primal gap and a not previously analyzed dual gap for strongly convex optimization. Consequently, our theory provides these classic methods with simple, optimal stopping criteria and optimality certificates at no added computational cost. Our results apply under nearly any stepsize selection and for a range of non-Lipschitz ill-conditioned problems where the early iterations of the subgradient method may diverge exponentially quickly (a phenomenon which, to the best of our knowledge, no prior works address). Even in the presence of such undesirable behaviors, our theory still ensures and bounds eventual convergence.  ( 2 min )
    Shuffle SGD is Always Better than SGD: Improved Analysis of SGD with Arbitrary Data Orders. (arXiv:2305.19259v2 [cs.LG] UPDATED)
    Stochastic Gradient Descent (SGD) algorithms are widely used in optimizing neural networks, with Random Reshuffling (RR) and Single Shuffle (SS) being popular choices for cycling through random or single permutations of the training data. However, the convergence properties of these algorithms in the non-convex case are not fully understood. Existing results suggest that, in realistic training scenarios where the number of epochs is smaller than the training set size, RR may perform worse than SGD. In this paper, we analyze a general SGD algorithm that allows for arbitrary data orderings and show improved convergence rates for non-convex functions. Specifically, our analysis reveals that SGD with random and single shuffling is always faster or at least as good as classical SGD with replacement, regardless of the number of iterations. Overall, our study highlights the benefits of using SGD with random/single shuffling and provides new insights into its convergence properties for non-convex optimization.  ( 2 min )
    A survey of Generative AI Applications. (arXiv:2306.02781v2 [cs.LG] UPDATED)
    Generative AI has experienced remarkable growth in recent years, leading to a wide array of applications across diverse domains. In this paper, we present a comprehensive survey of more than 350 generative AI applications, providing a structured taxonomy and concise descriptions of various unimodal and even multimodal generative AIs. The survey is organized into sections, covering a wide range of unimodal generative AI applications such as text, images, video, gaming and brain information. Our survey aims to serve as a valuable resource for researchers and practitioners to navigate the rapidly expanding landscape of generative AI, facilitating a better understanding of the current state-of-the-art and fostering further innovation in the field.  ( 2 min )
    Generating Images with Multimodal Language Models. (arXiv:2305.17216v2 [cs.CL] UPDATED)
    We propose a method to fuse frozen text-only large language models (LLMs) with pre-trained image encoder and decoder models, by mapping between their embedding spaces. Our model demonstrates a wide suite of multimodal capabilities: image retrieval, novel image generation, and multimodal dialogue. Ours is the first approach capable of conditioning on arbitrarily interleaved image and text inputs to generate coherent image (and text) outputs. To achieve strong performance on image generation, we propose an efficient mapping network to ground the LLM to an off-the-shelf text-to-image generation model. This mapping network translates hidden representations of text into the embedding space of the visual models, enabling us to leverage the strong text representations of the LLM for visual outputs. Our approach outperforms baseline generation models on tasks with longer and more complex language. In addition to novel image generation, our model is also capable of image retrieval from a prespecified dataset, and decides whether to retrieve or generate at inference time. This is done with a learnt decision module which conditions on the hidden representations of the LLM. Our model exhibits a wider range of capabilities compared to prior multimodal language models. It can process image-and-text inputs, and produce retrieved images, generated images, and generated text -- outperforming non-LLM based generation models across several text-to-image tasks that measure context dependence.  ( 2 min )
    Enhancing COVID-19 Diagnosis through Vision Transformer-Based Analysis of Chest X-ray Images. (arXiv:2306.06914v2 [eess.IV] UPDATED)
    The advent of 2019 Coronavirus (COVID-19) has engendered a momentous global health crisis, necessitating the identification of the ailment in individuals through diverse diagnostic modalities. Radiological imaging, particularly the deployment of X-ray imaging, has been recognized as a pivotal instrument in the detection and characterization of COVID-19. Recent investigations have unveiled invaluable insights pertaining to the virus within X-ray images, instigating the exploration of methodologies aimed at augmenting diagnostic accuracy through the utilization of artificial intelligence (AI) techniques. The current research endeavor posits an innovative framework for the automated diagnosis of COVID-19, harnessing raw chest X-ray images, specifically by means of fine-tuning pre-trained Vision Transformer (ViT) models. The developed models were appraised in terms of their binary classification performance, discerning COVID-19 from Normal cases, as well as their ternary classification performance, discriminating COVID-19 from Pneumonia and Normal instances, and lastly, their quaternary classification performance, discriminating COVID-19 from Bacterial Pneumonia, Viral Pneumonia, and Normal conditions, employing distinct datasets. The proposed model evinced extraordinary precision, registering results of 99.92% and 99.84% for binary classification, 97.95% and 86.48% for ternary classification, and 86.81% for quaternary classification, respectively, on the respective datasets.  ( 3 min )
    Multi-BVOC Super-Resolution Exploiting Compounds Inter-Connection. (arXiv:2305.14180v2 [eess.IV] UPDATED)
    Biogenic Volatile Organic Compounds (BVOCs) emitted from the terrestrial ecosystem into the Earth's atmosphere are an important component of atmospheric chemistry. Due to the scarcity of measurement, a reliable enhancement of BVOCs emission maps can aid in providing denser data for atmospheric chemical, climate, and air quality models. In this work, we propose a strategy to super-resolve coarse BVOC emission maps by simultaneously exploiting the contributions of different compounds. To this purpose, we first accurately investigate the spatial inter-connections between several BVOC species. Then, we exploit the found similarities to build a Multi-Image Super-Resolution (MISR) system, in which a number of emission maps associated with diverse compounds are aggregated to boost Super-Resolution (SR) performance. We compare different configurations regarding the species and the number of joined BVOCs. Our experimental results show that incorporating BVOCs' relationship into the process can substantially improve the accuracy of the super-resolved maps. Interestingly, the best results are achieved when we aggregate the emission maps of strongly uncorrelated compounds. This peculiarity seems to confirm what was already guessed for other data-domains, i.e., joined uncorrelated information are more helpful than correlated ones to boost MISR performance. Nonetheless, the proposed work represents the first attempt in SR of BVOC emissions through the fusion of multiple different compounds.  ( 2 min )
    Scaling Data-Constrained Language Models. (arXiv:2305.16264v3 [cs.CL] UPDATED)
    The current trend of scaling language models involves increasing both parameter count and training dataset size. Extrapolating this trend suggests that training dataset size may soon be limited by the amount of text data available on the internet. Motivated by this limit, we investigate scaling language models in data-constrained regimes. Specifically, we run a large set of experiments varying the extent of data repetition and compute budget, ranging up to 900 billion training tokens and 9 billion parameter models. We find that with constrained data for a fixed compute budget, training with up to 4 epochs of repeated data yields negligible changes to loss compared to having unique data. However, with more repetition, the value of adding compute eventually decays to zero. We propose and empirically validate a scaling law for compute optimality that accounts for the decreasing value of repeated tokens and excess parameters. Finally, we experiment with approaches mitigating data scarcity, including augmenting the training dataset with code data or removing commonly used filters. Models and datasets from our 400 training runs are freely available at https://github.com/huggingface/datablations.  ( 2 min )
    TASRA: a Taxonomy and Analysis of Societal-Scale Risks from AI. (arXiv:2306.06924v2 [cs.AI] UPDATED)
    While several recent works have identified societal-scale and extinction-level risks to humanity arising from artificial intelligence, few have attempted an {\em exhaustive taxonomy} of such risks. Many exhaustive taxonomies are possible, and some are useful -- particularly if they reveal new risks or practical approaches to safety. This paper explores a taxonomy based on accountability: whose actions lead to the risk, are the actors unified, and are they deliberate? We also provide stories to illustrate how the various risk types could each play out, including risks arising from unanticipated interactions of many AI systems, as well as risks from deliberate misuse, for which combined technical and policy solutions are indicated.  ( 2 min )
    Reconstruction-based Out-of-Distribution Detection for Short-Range FMCW Radar. (arXiv:2302.14192v2 [eess.SP] UPDATED)
    Out-of-distribution (OOD) detection recently has drawn attention due to its critical role in the safe deployment of modern neural network architectures in real-world applications. The OOD detectors aim to distinguish samples that lie outside the training distribution in order to avoid the overconfident predictions of machine learning models on OOD data. Existing detectors, which mainly rely on the logit, intermediate feature space, softmax score, or reconstruction loss, manage to produce promising results. However, most of these methods are developed for the image domain. In this study, we propose a novel reconstruction-based OOD detector to operate on the radar domain. Our method exploits an autoencoder (AE) and its latent representation to detect the OOD samples. We propose two scores incorporating the patch-based reconstruction loss and the energy value calculated from the latent representations of each patch. We achieve an AUROC of 90.72% on our dataset collected by using 60 GHz short-range FMCW Radar. The experiments demonstrate that, in terms of AUROC and AUPR, our method outperforms the baseline (AE) and the other state-of-the-art methods. Also, thanks to its model size of 641 kB, our detector is suitable for embedded usage.  ( 2 min )
    Spherical Inducing Features for Orthogonally-Decoupled Gaussian Processes. (arXiv:2304.14034v2 [cs.LG] UPDATED)
    Despite their many desirable properties, Gaussian processes (GPs) are often compared unfavorably to deep neural networks (NNs) for lacking the ability to learn representations. Recent efforts to bridge the gap between GPs and deep NNs have yielded a new class of inter-domain variational GPs in which the inducing variables correspond to hidden units of a feedforward NN. In this work, we examine some practical issues associated with this approach and propose an extension that leverages the orthogonal decomposition of GPs to mitigate these limitations. In particular, we introduce spherical inter-domain features to construct more flexible data-dependent basis functions for both the principal and orthogonal components of the GP approximation and show that incorporating NN activation features under this framework not only alleviates these shortcomings but is more scalable than alternative strategies. Experiments on multiple benchmark datasets demonstrate the effectiveness of our approach.  ( 2 min )
    How Expressive are Spectral-Temporal Graph Neural Networks for Time Series Forecasting?. (arXiv:2305.06587v2 [cs.LG] UPDATED)
    Spectral-temporal graph neural network is a promising abstraction underlying most time series forecasting models that are based on graph neural networks (GNNs). However, more is needed to know about the underpinnings of this branch of methods. In this paper, we establish a theoretical framework that unravels the expressive power of spectral-temporal GNNs. Our results show that linear spectral-temporal GNNs are universal under mild assumptions, and their expressive power is bounded by our extended first-order Weisfeiler-Leman algorithm on discrete-time dynamic graphs. To make our findings useful in practice on valid instantiations, we discuss related constraints in detail and outline a theoretical blueprint for designing spatial and temporal modules in spectral domains. Building on these insights and to demonstrate how powerful spectral-temporal GNNs are based on our framework, we propose a simple instantiation named Temporal Graph GegenConv (TGC), which significantly outperforms most existing models with only linear components and shows better model efficiency.  ( 2 min )
    Class-Balancing Diffusion Models. (arXiv:2305.00562v2 [cs.CV] UPDATED)
    Diffusion-based models have shown the merits of generating high-quality visual data while preserving better diversity in recent studies. However, such observation is only justified with curated data distribution, where the data samples are nicely pre-processed to be uniformly distributed in terms of their labels. In practice, a long-tailed data distribution appears more common and how diffusion models perform on such class-imbalanced data remains unknown. In this work, we first investigate this problem and observe significant degradation in both diversity and fidelity when the diffusion model is trained on datasets with class-imbalanced distributions. Especially in tail classes, the generations largely lose diversity and we observe severe mode-collapse issues. To tackle this problem, we set from the hypothesis that the data distribution is not class-balanced, and propose Class-Balancing Diffusion Models (CBDM) that are trained with a distribution adjustment regularizer as a solution. Experiments show that images generated by CBDM exhibit higher diversity and quality in both quantitative and qualitative ways. Our method benchmarked the generation results on CIFAR100/CIFAR100LT dataset and shows outstanding performance on the downstream recognition task.  ( 2 min )
    LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention. (arXiv:2303.16199v2 [cs.CV] UPDATED)
    We present LLaMA-Adapter, a lightweight adaption method to efficiently fine-tune LLaMA into an instruction-following model. Using 52K self-instruct demonstrations, LLaMA-Adapter only introduces 1.2M learnable parameters upon the frozen LLaMA 7B model, and costs less than one hour for fine-tuning on 8 A100 GPUs. Specifically, we adopt a set of learnable adaption prompts, and prepend them to the word tokens at higher transformer layers. Then, a zero-initialized attention mechanism with zero gating is proposed, which adaptively injects the new instructional cues into LLaMA, while effectively preserves its pre-trained knowledge. With our efficient training, LLaMA-Adapter can generate high-quality responses, comparable to Alpaca with fully fine-tuned 7B parameters. Besides language commands, our approach can be simply extended to multi-modal instructions for learning image-conditioned LLaMA model, which achieves superior reasoning performance on ScienceQA and COCO Caption benchmarks. Furthermore, we also evaluate the zero-initialized attention mechanism for fine-tuning other pre-trained models (ViT, RoBERTa) on traditional vision and language tasks, demonstrating the superior generalization capacity of our approach. Code is released at https://github.com/OpenGVLab/LLaMA-Adapter.  ( 2 min )
    Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca. (arXiv:2304.08177v2 [cs.CL] UPDATED)
    Large Language Models (LLMs), such as ChatGPT and GPT-4, have dramatically transformed natural language processing research and shown promising strides towards Artificial General Intelligence (AGI). Nonetheless, the high costs associated with training and deploying LLMs present substantial obstacles to transparent, accessible academic research. While several large language models, such as LLaMA, have been open-sourced by the community, these predominantly focus on English corpora, limiting their usefulness for other languages. In this paper, we propose a method to augment LLaMA with capabilities for understanding and generating Chinese text and its ability to follow instructions. We achieve this by extending LLaMA's existing vocabulary with an additional 20,000 Chinese tokens, thereby improving its encoding efficiency and semantic understanding of Chinese. We further incorporate secondary pre-training using Chinese data and fine-tune the model with Chinese instruction datasets, significantly enhancing the model's ability to comprehend and execute instructions. Our experimental results indicate that the newly proposed model markedly enhances the original LLaMA's proficiency in understanding and generating Chinese content. Additionally, the results on the C-Eval dataset yield competitive performance among the models with several times the size of ours. We have made our pre-trained models, training scripts, and other resources available through GitHub, fostering open research for our community. GitHub repository: https://github.com/ymcui/Chinese-LLaMA-Alpaca  ( 2 min )
    Meta-Polyp: a baseline for efficient Polyp segmentation. (arXiv:2305.07848v2 [eess.IV] UPDATED)
    In recent years, polyp segmentation has gained significant importance, and many methods have been developed using CNN, Vision Transformer, and Transformer techniques to achieve competitive results. However, these methods often face difficulties when dealing with out-of-distribution datasets, missing boundaries, and small polyps. In 2022, Meta-Former was introduced as a new baseline for vision, which not only improved the performance of multi-task computer vision but also addressed the limitations of the Vision Transformer and CNN family backbones. To further enhance segmentation, we propose a fusion of Meta-Former with UNet, along with the introduction of a Multi-scale Upsampling block with a level-up combination in the decoder stage to enhance the texture, also we propose the Convformer block base on the idea of the Meta-former to enhance the crucial information of the local feature. These blocks enable the combination of global information, such as the overall shape of the polyp, with local information and boundary information, which is crucial for the decision of the medical segmentation. Our proposed approach achieved competitive performance and obtained the top result in the State of the Art on the CVC-300 dataset, Kvasir, and CVC-ColonDB dataset. Apart from Kvasir-SEG, others are out-of-distribution datasets. The implementation can be found at: https://github.com/huyquoctrinh/MetaPolyp-CBMS2023.
    Mixed Traffic Control and Coordination from Pixels. (arXiv:2302.09167v2 [cs.MA] UPDATED)
    Traffic congestion is a persistent problem in our society. Existing methods for traffic control have proven futile in alleviating current congestion levels leading researchers to explore ideas with robot vehicles given the increased emergence of vehicles with different levels of autonomy on our roads. This gives rise to mixed traffic control, where robot vehicles regulate human-driven vehicles through reinforcement learning (RL). However, most existing studies use precise observations that involve global information, such as environment outflow, and local information, i.e., vehicle positions and velocities. Obtaining this information requires updating existing road infrastructure with vast sensor environments and communication to potentially unwilling human drivers. We consider image observations as the alternative for mixed traffic control via RL: 1) images are ubiquitous through satellite imagery, in-car camera systems, and traffic monitoring systems; 2) images do not require a complete re-imagination of the observation space from environment to environment; and 3) images only require communication to equipment. In this work, we show robot vehicles using image observations can achieve similar performance to using precise information on environments, including ring, figure eight, intersection, merge, and bottleneck. In certain scenarios, our approach even outperforms using precision observations, e.g., up to 26% increase in average vehicle velocity in the merge environment and a 6% increase in outflow in the bottleneck environment, despite only using local traffic information as opposed to global traffic information.  ( 2 min )
    Local Energy Distribution Based Hyperparameter Determination for Stochastic Simulated Annealing. (arXiv:2304.11839v2 [cs.LG] UPDATED)
    This paper presents a local energy distribution based hyperparameter determination for stochastic simulated annealing (SSA). SSA is capable of solving combinatorial optimization problems faster than typical simulated annealing (SA), but requires a time-consuming hyperparameter search. The proposed method determines hyperparameters based on the local energy distributions of spins (probabilistic bits). The spin is a basic computing element of SSA and is graphically connected to other spins with its weights. The distribution of the local energy can be estimated based on the central limit theorem (CLT). The CLT-based normal distribution is used to determine the hyperparameters, which reduces the time complexity for hyperparameter search from O(n^3) of the conventional method to O(1). The performance of SSA with the determined hyperparameters is evaluated on the Gset and K2000 benchmarks for maximum-cut problems. The results show that the proposed method achieves mean cut values of approximately 98% of the best-known cut values.  ( 2 min )
    On Mitigating the Utility-Loss in Differentially Private Learning: A new Perspective by a Geometrically Inspired Kernel Approach. (arXiv:2304.01300v2 [cs.LG] UPDATED)
    Privacy-utility tradeoff remains as one of the fundamental issues of differentially private machine learning. This paper introduces a geometrically inspired kernel-based approach to mitigate the accuracy-loss issue in classification. In this approach, a representation of the affine hull of given data points is learned in Reproducing Kernel Hilbert Spaces (RKHS). This leads to a novel distance measure that hides privacy-sensitive information about individual data points and improves the privacy-utility tradeoff via significantly reducing the risk of membership inference attacks. The effectiveness of the approach is demonstrated through experiments on MNIST dataset, Freiburg groceries dataset, and a real biomedical dataset. It is verified that the approach remains computationally practical. The application of the approach to federated learning is considered and it is observed that the accuracy-loss due to data being distributed is either marginal or not significantly high.  ( 2 min )
    A Meta-learning based Generalizable Indoor Localization Model using Channel State Information. (arXiv:2305.13453v2 [cs.LG] UPDATED)
    Indoor localization has gained significant attention in recent years due to its various applications in smart homes, industrial automation, and healthcare, especially since more people rely on their wireless devices for location-based services. Deep learning-based solutions have shown promising results in accurately estimating the position of wireless devices in indoor environments using wireless parameters such as Channel State Information (CSI) and Received Signal Strength Indicator (RSSI). However, despite the success of deep learning-based approaches in achieving high localization accuracy, these models suffer from a lack of generalizability and can not be readily-deployed to new environments or operate in dynamic environments without retraining. In this paper, we propose meta-learning-based localization models to address the lack of generalizability that persists in conventionally trained DL-based localization models. Furthermore, since meta-learning algorithms require diverse datasets from several different scenarios, which can be hard to collect in the context of localization, we design and propose a new meta-learning algorithm, TB-MAML (Task Biased Model Agnostic Meta Learning), intended to further improve generalizability when the dataset is limited. Lastly, we evaluate the performance of TB-MAML-based localization against conventionally trained localization models and localization done using other meta-learning algorithms.  ( 2 min )
    ASPEST: Bridging the Gap Between Active Learning and Selective Prediction. (arXiv:2304.03870v2 [cs.LG] UPDATED)
    Selective prediction aims to learn a reliable model that abstains from making predictions when the model uncertainty is high. These predictions can then be deferred to a human expert for further evaluation. In many real-world scenarios, the distribution of test data is different from the training data. This results in more inaccurate predictions, necessitating increased human labeling, which can be difficult and expensive. Active learning circumvents this by only querying the most informative examples and, in several cases, has been shown to lower the overall labeling effort. In this work, we bridge selective prediction and active learning, proposing a new learning paradigm called active selective prediction which learns to query more informative samples from the shifted target domain while increasing accuracy and coverage. For this new problem, we propose a simple but effective solution, ASPEST, that utilizes ensembles of model snapshots with self-training with their aggregated outputs as pseudo labels. Extensive experiments on numerous image, text and structured datasets, particularly those suffer from domain shifts, demonstrate that our proposed method can significantly outperform prior work on selective prediction and active learning (e.g. on the MNIST$\to$SVHN benchmark with the labeling budget of $100$, ASPEST improves the AUC metric from $79.36\%$ to $88.84\%$) and achieves more optimal utilization of humans in the loop.  ( 2 min )
    MalProtect: Stateful Defense Against Adversarial Query Attacks in ML-based Malware Detection. (arXiv:2302.10739v2 [cs.LG] UPDATED)
    ML models are known to be vulnerable to adversarial query attacks. In these attacks, queries are iteratively perturbed towards a particular class without any knowledge of the target model besides its output. The prevalence of remotely-hosted ML classification models and Machine-Learning-as-a-Service platforms means that query attacks pose a real threat to the security of these systems. To deal with this, stateful defenses have been proposed to detect query attacks and prevent the generation of adversarial examples by monitoring and analyzing the sequence of queries received by the system. Several stateful defenses have been proposed in recent years. However, these defenses rely solely on similarity or out-of-distribution detection methods that may be effective in other domains. In the malware detection domain, the methods to generate adversarial examples are inherently different, and therefore we find that such detection mechanisms are significantly less effective. Hence, in this paper, we present MalProtect, which is a stateful defense against query attacks in the malware detection domain. MalProtect uses several threat indicators to detect attacks. Our results show that it reduces the evasion rate of adversarial query attacks by 80+\% in Android and Windows malware, across a range of attacker scenarios. In the first evaluation of its kind, we show that MalProtect outperforms prior stateful defenses, especially under the peak adversarial threat.  ( 2 min )
    Probabilistic relations for modelling epistemic and aleatoric uncertainty: semantics and automated reasoning with theorem proving. (arXiv:2303.09692v2 [cs.LO] UPDATED)
    Probabilistic programming combines general computer programming, statistical inference, and formal semantics to help systems make decisions when facing uncertainty. Probabilistic programs are ubiquitous, including having a significant impact on machine intelligence. While many probabilistic algorithms have been used in practice in different domains, their automated verification based on formal semantics is still a relatively new research area. In the last two decades, it has attracted much interest. Many challenges, however, remain. The work presented in this paper, probabilistic relations, takes a step towards our vision to tackle these challenges. Our work is based on Hehner's predicative probabilistic programming, but there are several obstacles to the broader adoption of his work. Our contributions here include (1) the formalisation of its syntax and semantics by introducing an Iverson bracket notation to separate relations from arithmetic; (2) the formalisation of relations using Unifying Theories of Programming (UTP) and probabilities outside the brackets using summation over the topological space of the real numbers; (3) the constructive semantics for probabilistic loops using Kleene's fixed-point theorem; (4) the enrichment of its semantics from distributions to subdistributions and superdistributions to deal with the constructive semantics; (5) the unique fixed-point theorem to simplify the reasoning about probabilistic loops; and (6) the mechanisation of our theory in Isabelle/UTP, an implementation of UTP in Isabelle/HOL, for automated reasoning using theorem proving. We demonstrate our work with six examples, including problems in robot localisation, classification in machine learning, and the termination of probabilistic loops.  ( 3 min )
    Joint optimization of a $\beta$-VAE for ECG task-specific feature extraction. (arXiv:2304.06476v2 [eess.SP] UPDATED)
    Electrocardiography is the most common method to investigate the condition of the heart through the observation of cardiac rhythm and electrical activity, for both diagnosis and monitoring purposes. Analysis of electrocardiograms (ECGs) is commonly performed through the investigation of specific patterns, which are visually recognizable by trained physicians and are known to reflect cardiac (dis)function. In this work we study the use of $\beta$-variational autoencoders (VAEs) as an explainable feature extractor, and improve on its predictive capacities by jointly optimizing signal reconstruction and cardiac function prediction. The extracted features are then used for cardiac function prediction using logistic regression. The method is trained and tested on data from 7255 patients, who were treated for acute coronary syndrome at the Leiden University Medical Center between 2010 and 2021. The results show that our method significantly improved prediction and explainability compared to a vanilla $\beta$-VAE, while still yielding similar reconstruction performance.  ( 2 min )
    Contextual Combinatorial Bandits with Probabilistically Triggered Arms. (arXiv:2303.17110v2 [cs.LG] UPDATED)
    We study contextual combinatorial bandits with probabilistically triggered arms (C$^2$MAB-T) under a variety of smoothness conditions that capture a wide range of applications, such as contextual cascading bandits and contextual influence maximization bandits. Under the triggering probability modulated (TPM) condition, we devise the C$^2$-UCB-T algorithm and propose a novel analysis that achieves an $\tilde{O}(d\sqrt{KT})$ regret bound, removing a potentially exponentially large factor $O(1/p_{\min})$, where $d$ is the dimension of contexts, $p_{\min}$ is the minimum positive probability that any arm can be triggered, and batch-size $K$ is the maximum number of arms that can be triggered per round. Under the variance modulated (VM) or triggering probability and variance modulated (TPVM) conditions, we propose a new variance-adaptive algorithm VAC$^2$-UCB and derive a regret bound $\tilde{O}(d\sqrt{T})$, which is independent of the batch-size $K$. As a valuable by-product, our analysis technique and variance-adaptive algorithm can be applied to the CMAB-T and C$^2$MAB setting, improving existing results there as well. We also include experiments that demonstrate the improved performance of our algorithms compared with benchmark algorithms on synthetic and real-world datasets.  ( 2 min )
    Fundus vascular image segmentation based on multiple attention mechanisms and deep learning. (arXiv:2305.03617v2 [eess.IV] UPDATED)
    Accurately segmenting blood vessels in retinal fundus images is crucial in the early screening, diagnosing, and evaluating some ocular diseases, yet it poses a nontrivial uncertainty for the segmentation task due to various factors such as significant light variations, uneven curvilinear structures, and non-uniform contrast. As a result, a useful approach based on multiple attention mechanisms and deep learning is proposed to accurately detect blood vessels in retinal fundus images. To enrich contextual information for the loss of scene information compensation, an attention fusion mechanism that combines the channel attention with spatial attention mechanisms constructed by Transformer is employed to extract various features of blood vessels from retinal fundus images in both spatial and channel dimensions. Subsequently, a unique spatial attention mechanism is introduced in the skip connection to filter out redundant information and noise from low-level features, thus enabling better integration with high-level features. In addition, a DropOut layer is employed to randomly discard some neurons, which can prevent overfitting of the deep learning network and improve its generalization performance.  ( 2 min )
    OpenAGI: When LLM Meets Domain Experts. (arXiv:2304.04370v3 [cs.AI] UPDATED)
    Human intelligence has the remarkable ability to assemble basic skills into complex ones so as to solve complex tasks. This ability is equally important for Artificial Intelligence (AI), and thus, we assert that in addition to the development of large, comprehensive intelligent models, it is equally crucial to equip such models with the capability to harness various domain-specific expert models for complex task-solving in the pursuit of Artificial General Intelligence (AGI). Recent developments in Large Language Models (LLMs) have demonstrated remarkable learning and reasoning abilities, making them promising as a controller to select, synthesize, and execute external models to solve complex tasks. In this project, we develop OpenAGI, an open-source AGI research platform, specifically designed to offer complex, multi-step tasks and accompanied by task-specific datasets, evaluation metrics, and a diverse range of extensible models. OpenAGI formulates complex tasks as natural language queries, serving as input to the LLM. The LLM subsequently selects, synthesizes, and executes models provided by OpenAGI to address the task. Furthermore, we propose a Reinforcement Learning from Task Feedback (RLTF) mechanism, which uses the task-solving result as feedback to improve the LLM's task-solving ability. Thus, the LLM is responsible for synthesizing various external models for solving complex tasks, while RLTF provides feedback to improve its task-solving ability, enabling a feedback loop for self-improving AI. We believe that the paradigm of LLMs operating various expert models for complex task-solving is a promising approach towards AGI. To facilitate the community's long-term improvement and evaluation of AGI's ability, we open-source the code, benchmark, and evaluation methods of the OpenAGI project at https://github.com/agiresearch/OpenAGI.  ( 3 min )
    DeAR: Accelerating Distributed Deep Learning with Fine-Grained All-Reduce Pipelining. (arXiv:2302.12445v2 [cs.LG] UPDATED)
    Communication scheduling has been shown to be effective in accelerating distributed training, which enables all-reduce communications to be overlapped with backpropagation computations. This has been commonly adopted in popular distributed deep learning frameworks. However, there exist two fundamental problems: (1) excessive startup latency proportional to the number of workers for each all-reduce operation; (2) it only achieves sub-optimal training performance due to the dependency and synchronization requirement of the feed-forward computation in the next iteration. We propose a novel scheduling algorithm, DeAR, that decouples the all-reduce primitive into two continuous operations, which overlaps with both backpropagation and feed-forward computations without extra communications. We further design a practical tensor fusion algorithm to improve the training performance. Experimental results with five popular models show that DeAR achieves up to 83% and 15% training speedup over the state-of-the-art solutions on a 64-GPU cluster with 10Gb/s Ethernet and 100Gb/s InfiniBand interconnects, respectively.  ( 2 min )
    Bounding the Optimal Value Function in Compositional Reinforcement Learning. (arXiv:2303.02557v2 [cs.LG] UPDATED)
    In the field of reinforcement learning (RL), agents are often tasked with solving a variety of problems differing only in their reward functions. In order to quickly obtain solutions to unseen problems with new reward functions, a popular approach involves functional composition of previously solved tasks. However, previous work using such functional composition has primarily focused on specific instances of composition functions whose limiting assumptions allow for exact zero-shot composition. Our work unifies these examples and provides a more general framework for compositionality in both standard and entropy-regularized RL. We find that, for a broad class of functions, the optimal solution for the composite task of interest can be related to the known primitive task solutions. Specifically, we present double-sided inequalities relating the optimal composite value function to the value functions for the primitive tasks. We also show that the regret of using a zero-shot policy can be bounded for this class of functions. The derived bounds can be used to develop clipping approaches for reducing uncertainty during training, allowing agents to quickly adapt to new tasks.  ( 2 min )
    Learning the Delay Using Neural Delay Differential Equations. (arXiv:2304.01329v2 [math.OC] UPDATED)
    The intersection of machine learning and dynamical systems has generated considerable interest recently. Neural Ordinary Differential Equations (NODEs) represent a rich overlap between these fields. In this paper, we develop a continuous time neural network approach based on Delay Differential Equations (DDEs). Our model uses the adjoint sensitivity method to learn the model parameters and delay directly from data. Our approach is inspired by that of NODEs and extends earlier neural DDE models, which have assumed that the value of the delay is known a priori. We perform a sensitivity analysis on our proposed approach and demonstrate its ability to learn DDE parameters from benchmark systems. We conclude our discussion with potential future directions and applications.  ( 2 min )
    A Markovian Formalism for Active Querying. (arXiv:2306.08001v1 [cs.LG])
    Active learning algorithms have been an integral part of recent advances in artificial intelligence. However, the research in the field is widely varying and lacks an overall organizing leans. We outline a Markovian formalism for the field of active learning and survey the literature to demonstrate the organizing capability of our proposed formalism. Our formalism takes a partially observable Markovian system approach to the active learning process as a whole. We specifically outline how querying, dataset augmentation, reward updates, and other aspects of active learning can be viewed as a transition between meta-states in a Markovian system, and give direction into how other aspects of active learning can fit into our formalism.  ( 2 min )
    Flexible Channel Dimensions for Differentiable Architecture Search. (arXiv:2306.08021v1 [cs.LG])
    Finding optimal channel dimensions (i.e., the number of filters in DNN layers) is essential to design DNNs that perform well under computational resource constraints. Recent work in neural architecture search aims at automating the optimization of the DNN model implementation. However, existing neural architecture search methods for channel dimensions rely on fixed search spaces, which prevents achieving an efficient and fully automated solution. In this work, we propose a novel differentiable neural architecture search method with an efficient dynamic channel allocation algorithm to enable a flexible search space for channel dimensions. We show that the proposed framework is able to find DNN architectures that are equivalent to previous methods in task accuracy and inference latency for the CIFAR-10 dataset with an improvement of $1.3-1.7\times$ in GPU-hours and $1.5-1.7\times$ in the memory requirements during the architecture search stage. Moreover, the proposed frameworks do not require a well-engineered search space a priori, which is an important step towards fully automated design of DNN architectures.  ( 2 min )
    Domain-Aware Few-Shot Learning for Optical Coherence Tomography Noise Reduction. (arXiv:2306.08102v1 [eess.IV])
    Speckle noise has long been an extensively studied problem in medical imaging. In recent years, there have been significant advances in leveraging deep learning methods for noise reduction. Nevertheless, adaptation of supervised learning models to unseen domains remains a challenging problem. Specifically, deep neural networks (DNNs) trained for computational imaging tasks are vulnerable to changes in the acquisition system's physical parameters, such as: sampling space, resolution, and contrast. Even within the same acquisition system, performance degrades across datasets of different biological tissues. In this work, we propose a few-shot supervised learning framework for optical coherence tomography (OCT) noise reduction, that offers a dramatic increase in training speed and requires only a single image, or part of an image, and a corresponding speckle suppressed ground truth, for training. Furthermore, we formulate the domain shift problem for OCT diverse imaging systems, and prove that the output resolution of a despeckling trained model is determined by the source domain resolution. We also provide possible remedies. We propose different practical implementations of our approach, verify and compare their applicability, robustness, and computational efficiency. Our results demonstrate significant potential for generally improving sample complexity, generalization, and time efficiency, for coherent and non-coherent noise reduction via supervised learning models, that can also be leveraged for other real-time computer vision applications.  ( 2 min )
    Safe Use of Neural Networks. (arXiv:2306.08086v1 [eess.SP])
    Neural networks in modern communication systems can be susceptible to internal numerical errors that can drastically effect decision results. Such structures are composed of many sections each of which generally contain weighting operations and activation function evaluations. The safe use comes from methods employing number based codes that can detect arithmetic errors in the network's processing steps. Each set of operations generates parity values dictated by a code in two ways. One set of parities is obtained from a section's outputs while a second comparable set is developed directly from the original inputs. The parity values protecting the activation functions involve a Taylor series approximation to the activation functions. We focus on using long numerically based convolutional codes because of the large size of data sets. The codes are based on Discrete Fourier Transform kernels and there are many design options available. Mathematical program simulations show our error-detecting techniques are effective and efficient.  ( 2 min )
    DTW k-means clustering for fault detection in photovoltaic modules. (arXiv:2306.08003v1 [cs.LG])
    The increase in the use of photovoltaic (PV) energy in the world has shown that the useful life and maintenance of a PV plant directly depend on theability to quickly detect severe faults on a PV plant. To solve this problem of detection, data based approaches have been proposed in the literature.However, these previous solutions consider only specific behavior of one or few faults. Most of these approaches can be qualified as supervised, requiring an enormous labelling effort (fault types clearly identified in each technology). In addition, most of them are validated in PV cells or one PV module. That is hardly applicable in large-scale PV plants considering their complexity. Alternatively, some unsupervised well-known approaches based on data try to detect anomalies but are not able to identify precisely the type of fault. The most performant of these methods do manage to efficiently group healthy panels and separate them from faulty panels. In that way, this article presents an unsupervised approach called DTW K-means. This approach takes advantages of both the dynamic time warping (DWT) metric and the Kmeans clustering algorithm as a data-driven approach. The results of this mixed method in a PV string are compared to diagnostic labels established by visual inspection of the panels.  ( 2 min )
    Graph Structure and Feature Extrapolation for Out-of-Distribution Generalization. (arXiv:2306.08076v1 [cs.LG])
    Out-of-distribution (OOD) generalization deals with the prevalent learning scenario where test distribution shifts from training distribution. With rising application demands and inherent complexity, graph OOD problems call for specialized solutions. While data-centric methods exhibit performance enhancements on many generic machine learning tasks, there is a notable absence of data augmentation methods tailored for graph OOD generalization. In this work, we propose to achieve graph OOD generalization with the novel design of non-Euclidean-space linear extrapolation. The proposed augmentation strategy extrapolates both structure and feature spaces to generate OOD graph data. Our design tailors OOD samples for specific shifts without corrupting underlying causal mechanisms. Theoretical analysis and empirical results evidence the effectiveness of our method in solving target shifts, showing substantial and constant improvements across various graph OOD tasks.  ( 2 min )
    Privacy Inference-Empowered Stealthy Backdoor Attack on Federated Learning under Non-IID Scenarios. (arXiv:2306.08011v1 [cs.LG])
    Federated learning (FL) naturally faces the problem of data heterogeneity in real-world scenarios, but this is often overlooked by studies on FL security and privacy. On the one hand, the effectiveness of backdoor attacks on FL may drop significantly under non-IID scenarios. On the other hand, malicious clients may steal private data through privacy inference attacks. Therefore, it is necessary to have a comprehensive perspective of data heterogeneity, backdoor, and privacy inference. In this paper, we propose a novel privacy inference-empowered stealthy backdoor attack (PI-SBA) scheme for FL under non-IID scenarios. Firstly, a diverse data reconstruction mechanism based on generative adversarial networks (GANs) is proposed to produce a supplementary dataset, which can improve the attacker's local data distribution and support more sophisticated strategies for backdoor attacks. Based on this, we design a source-specified backdoor learning (SSBL) strategy as a demonstration, allowing the adversary to arbitrarily specify which classes are susceptible to the backdoor trigger. Since the PI-SBA has an independent poisoned data synthesis process, it can be integrated into existing backdoor attacks to improve their effectiveness and stealthiness in non-IID scenarios. Extensive experiments based on MNIST, CIFAR10 and Youtube Aligned Face datasets demonstrate that the proposed PI-SBA scheme is effective in non-IID FL and stealthy against state-of-the-art defense methods.  ( 2 min )
    Causal Feature Engineering of Price Directions of Cryptocurrencies using Dynamic Bayesian Networks. (arXiv:2306.08157v1 [cs.LG])
    Cryptocurrencies have gained popularity across various sectors, especially in finance and investment. The popularity is partly due to their unique specifications originating from blockchain-related characteristics such as privacy, decentralisation, and untraceability. Despite their growing popularity, cryptocurrencies remain a high-risk investment due to their price volatility and uncertainty. The inherent volatility in cryptocurrency prices, coupled with internal cryptocurrency-related factors and external influential global economic factors makes predicting their prices and price movement directions challenging. Nevertheless, the knowledge obtained from predicting the direction of cryptocurrency prices can provide valuable guidance for investors in making informed investment decisions. To address this issue, this paper proposes a dynamic Bayesian network (DBN) approach, which can model complex systems in multivariate settings, to predict the price movement direction of five popular altcoins (cryptocurrencies other than Bitcoin) in the next trading day. The efficacy of the proposed model in predicting cryptocurrency price directions is evaluated from two perspectives. Firstly, our proposed approach is compared to two baseline models, namely an auto-regressive integrated moving average and support vector regression. Secondly, from a feature engineering point of view, the impact of twenty-three different features, grouped into four categories, on the DBN's prediction performance is investigated. The experimental results demonstrate that the DBN significantly outperforms the baseline models. In addition, among the groups of features, technical indicators are found to be the most effective predictors of cryptocurrency price directions.  ( 3 min )
    Tune As You Scale: Hyperparameter Optimization For Compute Efficient Training. (arXiv:2306.08055v1 [cs.LG])
    Hyperparameter tuning of deep learning models can lead to order-of-magnitude performance gains for the same amount of compute. Despite this, systematic tuning is uncommon, particularly for large models, which are expensive to evaluate and tend to have many hyperparameters, necessitating difficult judgment calls about tradeoffs, budgets, and search bounds. To address these issues and propose a practical method for robustly tuning large models, we present Cost-Aware Pareto Region Bayesian Search (CARBS), a Bayesian optimization algorithm that performs local search around the performance-cost Pareto frontier. CARBS does well even in unbounded search spaces with many hyperparameters, learns scaling relationships so that it can tune models even as they are scaled up, and automates much of the "black magic" of tuning. Among our results, we effectively solve the entire ProcGen benchmark just by tuning a simple baseline (PPO, as provided in the original ProcGen paper). We also reproduce the model size vs. training tokens scaling result from the Chinchilla project (Hoffmann et al. 2022), while simultaneously discovering scaling laws for every other hyperparameter, via an easy automated process that uses significantly less compute and is applicable to any deep learning problem (not just language models).  ( 2 min )
    Self-supervised Deep Hyperspectral Inpainting with the Sparsity and Low-Rank Considerations. (arXiv:2306.08128v1 [eess.IV])
    Hyperspectral images are typically composed of hundreds of narrow and contiguous spectral bands, each containing information about the material composition of the imaged scene. However, these images can be affected by various sources of noise, distortions, or data losses, which can significantly degrade their quality and usefulness. To address these problems, we introduce two novel self-supervised Hyperspectral Images (HSI) inpainting algorithms: Low Rank and Sparsity Constraint Plug-and-Play (LRS-PnP), and its extension LRS-PnP-DIP, which features the strong learning capability, but is still free of external training data. We conduct the stability analysis under some mild assumptions which guarantees the algorithm to converge. It is specifically very helpful for the practical applications. Extensive experiments demonstrate that the proposed solution is able to produce visually and qualitatively superior inpainting results, achieving state-of-the-art performance. The code for reproducing the results is available at \url{https://github.com/shuoli0708/LRS-PnP-DIP}.
    Software Supply Chain Vulnerabilities Detection in Source Code: Performance Comparison between Traditional and Quantum Machine Learning Algorithms. (arXiv:2306.08060v1 [cs.CR])
    The software supply chain (SSC) attack has become one of the crucial issues that are being increased rapidly with the advancement of the software development domain. In general, SSC attacks execute during the software development processes lead to vulnerabilities in software products targeting downstream customers and even involved stakeholders. Machine Learning approaches are proven in detecting and preventing software security vulnerabilities. Besides, emerging quantum machine learning can be promising in addressing SSC attacks. Considering the distinction between traditional and quantum machine learning, performance could be varies based on the proportions of the experimenting dataset. In this paper, we conduct a comparative analysis between quantum neural networks (QNN) and conventional neural networks (NN) with a software supply chain attack dataset known as ClaMP. Our goal is to distinguish the performance between QNN and NN and to conduct the experiment, we develop two different models for QNN and NN by utilizing Pennylane for quantum and TensorFlow and Keras for traditional respectively. We evaluated the performance of both models with different proportions of the ClaMP dataset to identify the f1 score, recall, precision, and accuracy. We also measure the execution time to check the efficiency of both models. The demonstration result indicates that execution time for QNN is slower than NN with a higher percentage of datasets. Due to recent advancements in QNN, a large level of experiments shall be carried out to understand both models accurately in our future research.
    ppAURORA: Privacy Preserving Area Under Receiver Operating Characteristic and Precision-Recall Curves. (arXiv:2102.08788v3 [cs.LG] UPDATED)
    Computing an AUC as a performance measure to compare the quality of different machine learning models is one of the final steps of many research projects. Many of these methods are trained on privacy-sensitive data and there are several different approaches like $\epsilon$-differential privacy, federated machine learning and cryptography if the datasets cannot be shared or used jointly at one place for training and/or testing. In this setting, it can also be a problem to compute the global AUC, since the labels might also contain privacy-sensitive information. There have been approaches based on $\epsilon$-differential privacy to address this problem, but to the best of our knowledge, no exact privacy preserving solution has been introduced. In this paper, we propose an MPC-based solution, called ppAURORA, with private merging of individually sorted lists from multiple sources to compute the exact AUC as one could obtain on the pooled original test samples. With ppAURORA, the computation of the exact area under precision-recall and receiver operating characteristic curves is possible even when ties between prediction confidence values exist. We use ppAURORA to evaluate two different models predicting acute myeloid leukemia therapy response and heart disease, respectively. We also assess its scalability via synthetic data experiments. All these experiments show that we efficiently and privately compute the exact same AUC with both evaluation metrics as one can obtain on the pooled test samples in plaintext according to the semi-honest adversary setting.
    Expanding Versatility of Agile Locomotion through Policy Transitions Using Latent State Representation. (arXiv:2306.08224v1 [cs.RO])
    This paper proposes the transition-net, a robust transition strategy that expands the versatility of robot locomotion in the real-world setting. To this end, we start by distributing the complexity of different gaits into dedicated locomotion policies applicable to real-world robots. Next, we expand the versatility of the robot by unifying the policies with robust transitions into a single coherent meta-controller by examining the latent state representations. Our approach enables the robot to iteratively expand its skill repertoire and robustly transition between any policy pair in a library. In our framework, adding new skills does not introduce any process that alters the previously learned skills. Moreover, training of a locomotion policy takes less than an hour with a single consumer GPU. Our approach is effective in the real-world and achieves a 19% higher average success rate for the most challenging transition pairs in our experiments compared to existing approaches.
    Why Using Either Aggregated Features or Adjacency Lists in Directed or Undirected Graph? Empirical Study and Simple Classification Method. (arXiv:2306.08274v1 [cs.LG])
    Node classification is one of the hottest tasks in graph analysis. In this paper, we focus on the choices of node representations (aggregated features vs. adjacency lists) and the edge direction of an input graph (directed vs. undirected), which have a large influence on classification results. We address the first empirical study to benchmark the performance of various GNNs that use either combination of node representations and edge directions. Our experiments demonstrate that no single combination stably achieves state-of-the-art results across datasets, which indicates that we need to select appropriate combinations depending on the characteristics of datasets. In response, we propose a simple yet holistic classification method A2DUG which leverages all combinations of node representation variants in directed and undirected graphs. We demonstrate that A2DUG stably performs well on various datasets. Surprisingly, it largely outperforms the current state-of-the-art methods in several datasets. This result validates the importance of the adaptive effect control on the combinations of node representations and edge directions.
    LargeST: A Benchmark Dataset for Large-Scale Traffic Forecasting. (arXiv:2306.08259v1 [cs.LG])
    Traffic forecasting plays a critical role in smart city initiatives and has experienced significant advancements thanks to the power of deep learning in capturing non-linear patterns of traffic data. However, the promising results achieved on current public datasets may not be applicable to practical scenarios due to limitations within these datasets. First, the limited sizes of them may not reflect the real-world scale of traffic networks. Second, the temporal coverage of these datasets is typically short, posing hurdles in studying long-term patterns and acquiring sufficient samples for training deep models. Third, these datasets often lack adequate metadata for sensors, which compromises the reliability and interpretability of the data. To mitigate these limitations, we introduce the LargeST benchmark dataset. It encompasses a total number of 8,600 sensors with a 5-year time coverage and includes comprehensive metadata. Using LargeST, we perform in-depth data analysis to extract data insights, benchmark well-known baselines in terms of their performance and efficiency, and identify challenges as well as opportunities for future research. We release the datasets and baseline implementations at: https://github.com/liuxu77/LargeST.
    Detection and classification of faults aimed at preventive maintenance of PV systems. (arXiv:2306.08004v1 [cs.LG])
    Diagnosis in PV systems aims to detect, locate and identify faults. Diagnosing these faults is vital to guarantee energy production and extend the useful life of PV power plants. In the literature, multiple machine learning approaches have been proposed for this purpose. However, few of these works have paid special attention to the detection of fine faults and the specialized process of extraction and selection of features for their classification. A fine fault is one whose characteristic signature is difficult to distinguish to that of a healthy panel. As a contribution to the detection of fine faults (especially of the snail trail type), this article proposes an innovative approach based on the Random Forest (RF) algorithm. This approach uses a complex feature extraction and selection method that improves the computational time of fault classification while maintaining high accuracy.
    INT2.1: Towards Fine-Tunable Quantized Large Language Models with Error Correction through Low-Rank Adaptation. (arXiv:2306.08162v1 [cs.CL])
    We introduce a method that dramatically reduces fine-tuning VRAM requirements and rectifies quantization errors in quantized Large Language Models. First, we develop an extremely memory-efficient fine-tuning (EMEF) method for quantized models using Low-Rank Adaptation (LoRA), and drawing upon it, we construct an error-correcting algorithm designed to minimize errors induced by the quantization process. Our method reduces the memory requirements by up to 5.6 times, which enables fine-tuning a 7 billion parameter Large Language Model (LLM) on consumer laptops. At the same time, we propose a Low-Rank Error Correction (LREC) method that exploits the added LoRA layers to ameliorate the gap between the quantized model and its float point counterpart. Our error correction framework leads to a fully functional INT2 quantized LLM with the capacity to generate coherent English text. To the best of our knowledge, this is the first INT2 Large Language Model that has been able to reach such a performance. The overhead of our method is merely a 1.05 times increase in model size, which translates to an effective precision of INT2.1. Also, our method readily generalizes to other quantization standards, such as INT3, INT4, and INT8, restoring their lost performance, which marks a significant milestone in the field of model quantization. The strategies delineated in this paper hold promising implications for the future development and optimization of quantized models, marking a pivotal shift in the landscape of low-resource machine learning computations.
    CipherSniffer: Classifying Cipher Types. (arXiv:2306.08116v1 [cs.CL])
    Ciphers are a powerful tool for encrypting communication. There are many different cipher types, which makes it computationally expensive to solve a cipher using brute force. In this paper, we frame the decryption task as a classification problem. We first create a dataset of transpositions, substitutions, text reversals, word reversals, sentence shifts, and unencrypted text. Then, we evaluate the performance of various tokenizer-model combinations on this task.
    Safeguarding Data in Multimodal AI: A Differentially Private Approach to CLIP Training. (arXiv:2306.08173v1 [cs.LG])
    The surge in multimodal AI's success has sparked concerns over data privacy in vision-and-language tasks. While CLIP has revolutionized multimodal learning through joint training on images and text, its potential to unintentionally disclose sensitive information necessitates the integration of privacy-preserving mechanisms. We introduce a differentially private adaptation of the Contrastive Language-Image Pretraining (CLIP) model that effectively addresses privacy concerns while retaining accuracy. Our proposed method, Dp-CLIP, is rigorously evaluated on benchmark datasets encompassing diverse vision-and-language tasks such as image classification and visual question answering. We demonstrate that our approach retains performance on par with the standard non-private CLIP model. Furthermore, we analyze our proposed algorithm under linear representation settings. We derive the convergence rate of our algorithm and show a trade-off between utility and privacy when gradients are clipped per-batch and the loss function does not satisfy smoothness conditions assumed in the literature for the analysis of DP-SGD.
    FRIGATE: Frugal Spatio-temporal Forecasting on Road Networks. (arXiv:2306.08277v1 [cs.LG])
    Modelling spatio-temporal processes on road networks is a task of growing importance. While significant progress has been made on developing spatio-temporal graph neural networks (Gnns), existing works are built upon three assumptions that are not practical on real-world road networks. First, they assume sensing on every node of a road network. In reality, due to budget-constraints or sensor failures, all locations (nodes) may not be equipped with sensors. Second, they assume that sensing history is available at all installed sensors. This is unrealistic as well due to sensor failures, loss of packets during communication, etc. Finally, there is an assumption of static road networks. Connectivity within networks change due to road closures, constructions of new roads, etc. In this work, we develop FRIGATE to address all these shortcomings. FRIGATE is powered by a spatio-temporal Gnn that integrates positional, topological, and temporal information into rich inductive node representations. The joint fusion of this diverse information is made feasible through a novel combination of gated Lipschitz embeddings with Lstms. We prove that the proposed Gnn architecture is provably more expressive than message-passing Gnns used in state-of-the-art algorithms. The higher expressivity of FRIGATE naturally translates to superior empirical performance conducted on real-world network-constrained traffic data. In addition, FRIGATE is robust to frugal sensor deployment, changes in road network connectivity, and temporal irregularity in sensing.
    Operationalising Representation in Natural Language Processing. (arXiv:2306.08193v1 [cs.CL])
    Despite its centrality in the philosophy of cognitive science, there has been little prior philosophical work engaging with the notion of representation in contemporary NLP practice. This paper attempts to fill that lacuna: drawing on ideas from cognitive science, I introduce a framework for evaluating the representational claims made about components of neural NLP models, proposing three criteria with which to evaluate whether a component of a model represents a property and operationalising these criteria using probing classifiers, a popular analysis technique in NLP (and deep learning more broadly). The project of operationalising a philosophically-informed notion of representation should be of interest to both philosophers of science and NLP practitioners. It affords philosophers a novel testing-ground for claims about the nature of representation, and helps NLPers organise the large literature on probing experiments, suggesting novel avenues for empirical research.
    Robustness to Transformations Across Categories: Is Robustness To Transformations Driven by Invariant Neural Representations?. (arXiv:2007.00112v4 [cs.CV] UPDATED)
    Deep Convolutional Neural Networks (DCNNs) have demonstrated impressive robustness to recognize objects under transformations (eg. blur or noise) when these transformations are included in the training set. A hypothesis to explain such robustness is that DCNNs develop invariant neural representations that remain unaltered when the image is transformed. However, to what extent this hypothesis holds true is an outstanding question, as robustness to transformations could be achieved with properties different from invariance, eg. parts of the network could be specialized to recognize either transformed or non-transformed images. This paper investigates the conditions under which invariant neural representations emerge by leveraging that they facilitate robustness to transformations beyond the training distribution. Concretely, we analyze a training paradigm in which only some object categories are seen transformed during training and evaluate whether the DCNN is robust to transformations across categories not seen transformed. Our results with state-of-the-art DCNNs indicate that invariant neural representations do not always drive robustness to transformations, as networks show robustness for categories seen transformed during training even in the absence of invariant neural representations. Invariance only emerges as the number of transformed categories in the training set is increased. This phenomenon is much more prominent with local transformations such as blurring and high-pass filtering than geometric transformations such as rotation and thinning, which entail changes in the spatial arrangement of the object. Our results contribute to a better understanding of invariant neural representations in deep learning and the conditions under which it spontaneously emerges.
    Survey on Sociodemographic Bias in Natural Language Processing. (arXiv:2306.08158v1 [cs.CL])
    Deep neural networks often learn unintended biases during training, which might have harmful effects when deployed in real-world settings. This paper surveys 209 papers on bias in NLP models, most of which address sociodemographic bias. To better understand the distinction between bias and real-world harm, we turn to ideas from psychology and behavioral economics to propose a definition for sociodemographic bias. We identify three main categories of NLP bias research: types of bias, quantifying bias, and debiasing. We conclude that current approaches on quantifying bias face reliability issues, that many of the bias metrics do not relate to real-world biases, and that current debiasing techniques are superficial and hide bias rather than removing it. Finally, we provide recommendations for future work.
    Graph Laplacian Learning with Exponential Family Noise. (arXiv:2306.08201v1 [cs.LG])
    A common challenge in applying graph machine learning methods is that the underlying graph of a system is often unknown. Although different graph inference methods have been proposed for continuous graph signals, inferring the graph structure underlying other types of data, such as discrete counts, is under-explored. In this paper, we generalize a graph signal processing (GSP) framework for learning a graph from smooth graph signals to the exponential family noise distribution to model various data types. We propose an alternating algorithm that estimates the graph Laplacian as well as the unobserved smooth representation from the noisy signals. We demonstrate in synthetic and real-world data that our new algorithm outperforms competing Laplacian estimation methods under noise model mismatch.
    DCTX-Conformer: Dynamic context carry-over for low latency unified streaming and non-streaming Conformer. (arXiv:2306.08175v1 [eess.AS])
    Conformer-based end-to-end models have become ubiquitous these days and are commonly used in both streaming and non-streaming automatic speech recognition (ASR). Techniques like dual-mode and dynamic chunk training helped unify streaming and non-streaming systems. However, there remains a performance gap between streaming with a full and limited past context. To address this issue, we propose the integration of a novel dynamic contextual carry-over mechanism in a state-of-the-art (SOTA) unified ASR system. Our proposed dynamic context Conformer (DCTX-Conformer) utilizes a non-overlapping contextual carry-over mechanism that takes into account both the left context of a chunk and one or more preceding context embeddings. We outperform the SOTA by a relative 25.0% word error rate, with a negligible latency impact due to the additional context embeddings.
    Accelerated Convergence of Nesterov's Momentum for Deep Neural Networks under Partial Strong Convexity. (arXiv:2306.08109v1 [cs.LG])
    Current state-of-the-art analyses on the convergence of gradient descent for training neural networks focus on characterizing properties of the loss landscape, such as the Polyak-Lojaciewicz (PL) condition and the restricted strong convexity. While gradient descent converges linearly under such conditions, it remains an open question whether Nesterov's momentum enjoys accelerated convergence under similar settings and assumptions. In this work, we consider a new class of objective functions, where only a subset of the parameters satisfies strong convexity, and show Nesterov's momentum achieves acceleration in theory for this objective class. We provide two realizations of the problem class, one of which is deep ReLU networks, which --to the best of our knowledge--constitutes this work the first that proves accelerated convergence rate for non-trivial neural network architectures.
    Reinforcement Learning-Driven Linker Design via Fast Attention-based Point Cloud Alignment. (arXiv:2306.08166v1 [cs.LG])
    Proteolysis-Targeting Chimeras (PROTACs) represent a novel class of small molecules which are designed to act as a bridge between an E3 ligase and a disease-relevant protein, thereby promoting its subsequent degradation. PROTACs are composed of two protein binding "active" domains, linked by a "linker" domain. The design of the linker domain is challenging due to geometric and chemical constraints given by its interactions, and the need to maximize drug-likeness. To tackle these challenges, we introduce ShapeLinker, a method for de novo design of linkers. It performs fragment-linking using reinforcement learning on an autoregressive SMILES generator. The method optimizes for a composite score combining relevant physicochemical properties and a novel, attention-based point cloud alignment score. This new method successfully generates linkers that satisfy both relevant 2D and 3D requirements, and achieves state-of-the-art results in producing novel linkers assuming a target linker conformation. This allows for more rational and efficient PROTAC design and optimization. Code and data are available at https://github.com/aivant/ShapeLinker.
    Multi-market Energy Optimization with Renewables via Reinforcement Learning. (arXiv:2306.08147v1 [cs.LG])
    This paper introduces a deep reinforcement learning (RL) framework for optimizing the operations of power plants pairing renewable energy with storage. The objective is to maximize revenue from energy markets while minimizing storage degradation costs and renewable curtailment. The framework handles complexities such as time coupling by storage devices, uncertainty in renewable generation and energy prices, and non-linear storage models. The study treats the problem as a hierarchical Markov Decision Process (MDP) and uses component-level simulators for storage. It utilizes RL to incorporate complex storage models, overcoming restrictions of optimization-based methods that require convex and differentiable component models. A significant aspect of this approach is ensuring policy actions respect system constraints, achieved via a novel method of projecting potentially infeasible actions onto a safe state-action set. The paper demonstrates the efficacy of this approach through extensive experiments using data from US and Indian electricity markets, comparing the learned RL policies with a baseline control policy and a retrospective optimal control policy. It validates the adaptability of the learning framework with various storage models and shows the effectiveness of RL in a complex energy optimization setting, in the context of multi-market bidding, probabilistic forecasts, and accurate storage component models.
    Where Does My Model Underperform? A Human Evaluation of Slice Discovery Algorithms. (arXiv:2306.08167v1 [cs.HC])
    Machine learning (ML) models that achieve high average accuracy can still underperform on semantically coherent subsets (i.e. "slices") of data. This behavior can have significant societal consequences for the safety or bias of the model in deployment, but identifying these underperforming slices can be difficult in practice, especially in domains where practitioners lack access to group annotations to define coherent subsets of their data. Motivated by these challenges, ML researchers have developed new slice discovery algorithms that aim to group together coherent and high-error subsets of data. However, there has been little evaluation focused on whether these tools help humans form correct hypotheses about where (for which groups) their model underperforms. We conduct a controlled user study (N = 15) where we show 40 slices output by two state-of-the-art slice discovery algorithms to users, and ask them to form hypotheses about where an object detection model underperforms. Our results provide positive evidence that these tools provide some benefit over a naive baseline, and also shed light on challenges faced by users during the hypothesis formation step. We conclude by discussing design opportunities for ML and HCI researchers. Our findings point to the importance of centering users when designing and evaluating new tools for slice discovery.
    Uncertainty-Aware Robust Learning on Noisy Graphs. (arXiv:2306.08210v1 [cs.LG])
    Graph neural networks have shown impressive capabilities in solving various graph learning tasks, particularly excelling in node classification. However, their effectiveness can be hindered by the challenges arising from the widespread existence of noisy measurements associated with the topological or nodal information present in real-world graphs. These inaccuracies in observations can corrupt the crucial patterns within the graph data, ultimately resulting in undesirable performance in practical applications. To address these issues, this paper proposes a novel uncertainty-aware graph learning framework motivated by distributionally robust optimization. Specifically, we use a graph neural network-based encoder to embed the node features and find the optimal node embeddings by minimizing the worst-case risk through a minimax formulation. Such an uncertainty-aware learning process leads to improved node representations and a more robust graph predictive model that effectively mitigates the impact of uncertainty arising from data noise. Our experimental result shows that the proposed framework achieves superior predictive performance compared to the state-of-the-art baselines under various noisy settings.
    Can ChatGPT Enable ITS? The Case of Mixed Traffic Control via Reinforcement Learning. (arXiv:2306.08094v1 [cs.AI])
    The surge in Reinforcement Learning (RL) applications in Intelligent Transportation Systems (ITS) has contributed to its growth as well as highlighted key challenges. However, defining objectives of RL agents in traffic control and management tasks, as well as aligning policies with these goals through an effective formulation of Markov Decision Process (MDP), can be challenging and often require domain experts in both RL and ITS. Recent advancements in Large Language Models (LLMs) such as GPT-4 highlight their broad general knowledge, reasoning capabilities, and commonsense priors across various domains. In this work, we conduct a large-scale user study involving 70 participants to investigate whether novices can leverage ChatGPT to solve complex mixed traffic control problems. Three environments are tested, including ring road, bottleneck, and intersection. We find ChatGPT has mixed results. For intersection and bottleneck, ChatGPT increases number of successful policies by 150% and 136% compared to solely beginner capabilities, with some of them even outperforming experts. However, ChatGPT does not provide consistent improvements across all scenarios.
    Explainable and Position-Aware Learning in Digital Pathology. (arXiv:2306.08198v1 [eess.IV])
    Encoding whole slide images (WSI) as graphs is well motivated since it makes it possible for the gigapixel resolution WSI to be represented in its entirety for the purpose of graph learning. To this end, WSIs can be broken into smaller patches that represent the nodes of the graph. Then, graph-based learning methods can be utilized for the grading and classification of cancer. Message passing among neighboring nodes is the foundation of graph-based learning methods. However, they do not take into consideration any positional information for any of the patches, and if two patches are found in topologically isomorphic neighborhoods, their embeddings are nearly similar to one another. In this work, classification of cancer from WSIs is performed with positional embedding and graph attention. In order to represent the positional embedding of the nodes in graph classification, the proposed method makes use of spline convolutional neural networks (CNN). The algorithm is then tested with the WSI dataset for grading prostate cancer and kidney cancer. A comparison of the proposed method with leading approaches in cancer diagnosis and grading verify improved performance. The identification of cancerous regions in WSIs is another critical task in cancer diagnosis. In this work, the explainability of the proposed model is also addressed. A gradient-based explainbility approach is used to generate the saliency mapping for the WSIs. This can be used to look into regions of WSI that are responsible for cancer diagnosis thus rendering the proposed model explainable.
    Implicit Compressibility of Overparametrized Neural Networks Trained with Heavy-Tailed SGD. (arXiv:2306.08125v1 [stat.ML])
    Neural network compression has been an increasingly important subject, due to its practical implications in terms of reducing the computational requirements and its theoretical implications, as there is an explicit connection between compressibility and the generalization error. Recent studies have shown that the choice of the hyperparameters of stochastic gradient descent (SGD) can have an effect on the compressibility of the learned parameter vector. Even though these results have shed some light on the role of the training dynamics over compressibility, they relied on unverifiable assumptions and the resulting theory does not provide a practical guideline due to its implicitness. In this study, we propose a simple modification for SGD, such that the outputs of the algorithm will be provably compressible without making any nontrivial assumptions. We consider a one-hidden-layer neural network trained with SGD and we inject additive heavy-tailed noise to the iterates at each iteration. We then show that, for any compression rate, there exists a level of overparametrization (i.e., the number of hidden units), such that the output of the algorithm will be compressible with high probability. To achieve this result, we make two main technical contributions: (i) we build on a recent study on stochastic analysis and prove a 'propagation of chaos' result with improved rates for a class of heavy-tailed stochastic differential equations, and (ii) we derive strong-error estimates for their Euler discretization. We finally illustrate our approach on experiments, where the results suggest that the proposed approach achieves compressibility with a slight compromise from the training and test error.
    Neural Mixed Effects for Nonlinear Personalized Predictions. (arXiv:2306.08149v1 [cs.LG])
    Personalized prediction is a machine learning approach that predicts a person's future observations based on their past labeled observations and is typically used for sequential tasks, e.g., to predict daily mood ratings. When making personalized predictions, a model can combine two types of trends: (a) trends shared across people, i.e., person-generic trends, such as being happier on weekends, and (b) unique trends for each person, i.e., person-specific trends, such as a stressful weekly meeting. Mixed effect models are popular statistical models to study both trends by combining person-generic and person-specific parameters. Though linear mixed effect models are gaining popularity in machine learning by integrating them with neural networks, these integrations are currently limited to linear person-specific parameters: ruling out nonlinear person-specific trends. In this paper, we propose Neural Mixed Effect (NME) models to optimize nonlinear person-specific parameters anywhere in a neural network in a scalable manner. NME combines the efficiency of neural network optimization with nonlinear mixed effects modeling. Empirically, we observe that NME improves performance across six unimodal and multimodal datasets, including a smartphone dataset to predict daily mood and a mother-adolescent dataset to predict affective state sequences where half the mothers experience at least moderate symptoms of depression. Furthermore, we evaluate NME for two model architectures, including for neural conditional random fields (CRF) to predict affective state sequences where the CRF learns nonlinear person-specific temporal transitions between affective states. Analysis of these person-specific transitions on the mother-adolescent dataset shows interpretable trends related to the mother's depression symptoms.
    Inductive Linear Probing for Few-shot Node Classification. (arXiv:2306.08192v1 [cs.LG])
    Meta-learning has emerged as a powerful training strategy for few-shot node classification, demonstrating its effectiveness in the transductive setting. However, the existing literature predominantly focuses on transductive few-shot node classification, neglecting the widely studied inductive setting in the broader few-shot learning community. This oversight limits our comprehensive understanding of the performance of meta-learning based methods on graph data. In this work, we conduct an empirical study to highlight the limitations of current frameworks in the inductive few-shot node classification setting. Additionally, we propose a simple yet competitive baseline approach specifically tailored for inductive few-shot node classification tasks. We hope our work can provide a new path forward to better understand how the meta-learning paradigm works in the graph domain.
    Trustworthy Artificial Intelligence Framework for Proactive Detection and Risk Explanation of Cyber Attacks in Smart Grid. (arXiv:2306.07993v1 [cs.CR])
    The rapid growth of distributed energy resources (DERs), such as renewable energy sources, generators, consumers, and prosumers in the smart grid infrastructure, poses significant cybersecurity and trust challenges to the grid controller. Consequently, it is crucial to identify adversarial tactics and measure the strength of the attacker's DER. To enable a trustworthy smart grid controller, this work investigates a trustworthy artificial intelligence (AI) mechanism for proactive identification and explanation of the cyber risk caused by the control/status message of DERs. Thus, proposing and developing a trustworthy AI framework to facilitate the deployment of any AI algorithms for detecting potential cyber threats and analyzing root causes based on Shapley value interpretation while dynamically quantifying the risk of an attack based on Ward's minimum variance formula. The experiment with a state-of-the-art dataset establishes the proposed framework as a trustworthy AI by fulfilling the capabilities of reliability, fairness, explainability, transparency, reproducibility, and accountability.
    DHBE: Data-free Holistic Backdoor Erasing in Deep Neural Networks via Restricted Adversarial Distillation. (arXiv:2306.08009v1 [cs.LG])
    Backdoor attacks have emerged as an urgent threat to Deep Neural Networks (DNNs), where victim DNNs are furtively implanted with malicious neurons that could be triggered by the adversary. To defend against backdoor attacks, many works establish a staged pipeline to remove backdoors from victim DNNs: inspecting, locating, and erasing. However, in a scenario where a few clean data can be accessible, such pipeline is fragile and cannot erase backdoors completely without sacrificing model accuracy. To address this issue, in this paper, we propose a novel data-free holistic backdoor erasing (DHBE) framework. Instead of the staged pipeline, the DHBE treats the backdoor erasing task as a unified adversarial procedure, which seeks equilibrium between two different competing processes: distillation and backdoor regularization. In distillation, the backdoored DNN is distilled into a proxy model, transferring its knowledge about clean data, yet backdoors are simultaneously transferred. In backdoor regularization, the proxy model is holistically regularized to prevent from infecting any possible backdoor transferred from distillation. These two processes jointly proceed with data-free adversarial optimization until a clean, high-accuracy proxy model is obtained. With the novel adversarial design, our framework demonstrates its superiority in three aspects: 1) minimal detriment to model accuracy, 2) high tolerance for hyperparameters, and 3) no demand for clean data. Extensive experiments on various backdoor attacks and datasets are performed to verify the effectiveness of the proposed framework. Code is available at \url{https://github.com/yanzhicong/DHBE}
    Securing Visually-Aware Recommender Systems: An Adversarial Image Reconstruction and Detection Framework. (arXiv:2306.07992v1 [cs.CV])
    With rich visual data, such as images, becoming readily associated with items, visually-aware recommendation systems (VARS) have been widely used in different applications. Recent studies have shown that VARS are vulnerable to item-image adversarial attacks, which add human-imperceptible perturbations to the clean images associated with those items. Attacks on VARS pose new security challenges to a wide range of applications such as e-Commerce and social networks where VARS are widely used. How to secure VARS from such adversarial attacks becomes a critical problem. Currently, there is still a lack of systematic study on how to design secure defense strategies against visual attacks on VARS. In this paper, we attempt to fill this gap by proposing an adversarial image reconstruction and detection framework to secure VARS. Our proposed method can simultaneously (1) secure VARS from adversarial attacks characterized by local perturbations by image reconstruction based on global vision transformers; and (2) accurately detect adversarial examples using a novel contrastive learning approach. Meanwhile, our framework is designed to be used as both a filter and a detector so that they can be jointly trained to improve the flexibility of our defense strategy to a variety of attacks and VARS models. We have conducted extensive experimental studies with two popular attack methods (FGSM and PGD). Our experimental results on two real-world datasets show that our defense strategy against visual attacks is effective and outperforms existing methods on different attacks. Moreover, our method can detect adversarial examples with high accuracy.
    Semantic-Based Neural Network Repair. (arXiv:2306.07995v1 [cs.LG])
    Recently, neural networks have spread into numerous fields including many safety-critical systems. Neural networks are built (and trained) by programming in frameworks such as TensorFlow and PyTorch. Developers apply a rich set of pre-defined layers to manually program neural networks or to automatically generate them (e.g., through AutoML). Composing neural networks with different layers is error-prone due to the non-trivial constraints that must be satisfied in order to use those layers. In this work, we propose an approach to automatically repair erroneous neural networks. The challenge is in identifying a minimal modification to the network so that it becomes valid. Modifying a layer might have cascading effects on subsequent layers and thus our approach must search recursively to identify a "globally" minimal modification. Our approach is based on an executable semantics of deep learning layers and focuses on four kinds of errors which are common in practice. We evaluate our approach for two usage scenarios, i.e., repairing automatically generated neural networks and manually written ones suffering from common model bugs. The results show that we are able to repair 100% of a set of randomly generated neural networks (which are produced with an existing AI framework testing approach) effectively and efficiently (with an average repair time of 21.08s) and 93.75% of a collection of real neural network bugs (with an average time of 3min 40s).
    A Survey on Cross-Architectural IoT Malware Threat Hunting. (arXiv:2306.07989v1 [cs.CR])
    In recent years, the increase in non-Windows malware threats had turned the focus of the cybersecurity community. Research works on hunting Windows PE-based malwares are maturing, whereas the developments on Linux malware threat hunting are relatively scarce. With the advent of the Internet of Things (IoT) era, smart devices that are getting integrated into human life have become a hackers highway for their malicious activities. The IoT devices employ various Unix-based architectures that follow ELF (Executable and Linkable Format) as their standard binary file specification. This study aims at providing a comprehensive survey on the latest developments in cross-architectural IoT malware detection and classification approaches. Aided by a modern taxonomy, we discuss the feature representations, feature extraction techniques, and machine learning models employed in the surveyed works. We further provide more insights on the practical challenges involved in cross-architectural IoT malware threat hunting and discuss various avenues to instill potential future research.
    TopP\&R: Robust Support Estimation Approach for Evaluating Fidelity and Diversity in Generative Models. (arXiv:2306.08013v1 [cs.LG])
    We propose a robust and reliable evaluation metric for generative models by introducing topological and statistical treatments for rigorous support estimation. Existing metrics, such as Inception Score (IS), Fr\'echet Inception Distance (FID), and the variants of Precision and Recall (P\&R), heavily rely on supports that are estimated from sample features. However, the reliability of their estimation has not been seriously discussed (and overlooked) even though the quality of the evaluation entirely depends on it. In this paper, we propose Topological Precision and Recall (TopP\&R, pronounced 'topper'), which provides a systematic approach to estimating supports, retaining only topologically and statistically important features with a certain level of confidence. This not only makes TopP\&R strong for noisy features, but also provides statistical consistency. Our theoretical and experimental results show that TopP\&R is robust to outliers and non-independent and identically distributed (Non-IID) perturbations, while accurately capturing the true trend of change in samples. To the best of our knowledge, this is the first evaluation metric focused on the robust estimation of the support and provides its statistical consistency under noise.
    Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models. (arXiv:2306.08018v1 [q-bio.QM])
    Large Language Models (LLMs), with their remarkable task-handling capabilities and innovative outputs, have catalyzed significant advancements across a spectrum of fields. However, their proficiency within specialized domains such as biomolecular studies remains limited. To address this challenge, we introduce Mol-Instructions, a meticulously curated, comprehensive instruction dataset expressly designed for the biomolecular realm. Mol-Instructions is composed of three pivotal components: molecule-oriented instructions, protein-oriented instructions, and biomolecular text instructions, each curated to enhance the understanding and prediction capabilities of LLMs concerning biomolecular features and behaviors. Through extensive instruction tuning experiments on the representative LLM, we underscore the potency of Mol-Instructions to enhance the adaptability and cognitive acuity of large models within the complex sphere of biomolecular studies, thereby promoting advancements in the biomolecular research community. Mol-Instructions is made publicly accessible for future research endeavors and will be subjected to continual updates for enhanced applicability.
    (Amplified) Banded Matrix Factorization: A unified approach to private training. (arXiv:2306.08153v1 [cs.LG])
    Matrix factorization (MF) mechanisms for differential privacy (DP) have substantially improved the state-of-the-art in privacy-utility-computation tradeoffs for ML applications in a variety of scenarios, but in both the centralized and federated settings there remain instances where either MF cannot be easily applied, or other algorithms provide better tradeoffs (typically, as $\epsilon$ becomes small). In this work, we show how MF can subsume prior state-of-the-art algorithms in both federated and centralized training settings, across all privacy budgets. The key technique throughout is the construction of MF mechanisms with banded matrices. For cross-device federated learning (FL), this enables multiple-participations with a relaxed device participation schema compatible with practical FL infrastructure (as demonstrated by a production deployment). In the centralized setting, we prove that banded matrices enjoy the same privacy amplification results as for the ubiquitous DP-SGD algorithm, but can provide strictly better performance in most scenarios -- this lets us always at least match DP-SGD, and often outperform it even at $\epsilon\ll2$. Finally, $\hat{b}$-banded matrices substantially reduce the memory and time complexity of per-step noise generation from $\mathcal{O}(n)$, $n$ the total number of iterations, to a constant $\mathcal{O}(\hat{b})$, compared to general MF mechanisms.
    MSSRNet: Manipulating Sequential Style Representation for Unsupervised Text Style Transfer. (arXiv:2306.07994v1 [cs.CL])
    Unsupervised text style transfer task aims to rewrite a text into target style while preserving its main content. Traditional methods rely on the use of a fixed-sized vector to regulate text style, which is difficult to accurately convey the style strength for each individual token. In fact, each token of a text contains different style intensity and makes different contribution to the overall style. Our proposed method addresses this issue by assigning individual style vector to each token in a text, allowing for fine-grained control and manipulation of the style strength. Additionally, an adversarial training framework integrated with teacher-student learning is introduced to enhance training stability and reduce the complexity of high-dimensional optimization. The results of our experiments demonstrate the efficacy of our method in terms of clearly improved style transfer accuracy and content preservation in both two-style transfer and multi-style transfer settings.
    Machine Learning Approach on Multiclass Classification of Internet Firewall Log Files. (arXiv:2306.07997v1 [cs.CR])
    Firewalls are critical components in securing communication networks by screening all incoming (and occasionally exiting) data packets. Filtering is carried out by comparing incoming data packets to a set of rules designed to prevent malicious code from entering the network. To regulate the flow of data packets entering and leaving a network, an Internet firewall keeps a track of all activity. While the primary function of log files is to aid in troubleshooting and diagnostics, the information they contain is also very relevant to system audits and forensics. Firewalls primary function is to prevent malicious data packets from being sent. In order to better defend against cyberattacks and understand when and how malicious actions are influencing the internet, it is necessary to examine log files. As a result, the firewall decides whether to 'allow,' 'deny,' 'drop,' or 'reset-both' the incoming and outgoing packets. In this research, we apply various categorization algorithms to make sense of data logged by a firewall device. Harmonic mean F1 score, recall, and sensitivity measurement data with a 99% accuracy score in the random forest technique are used to compare the classifier's performance. To be sure, the proposed characteristics did significantly contribute to enhancing the firewall classification rate, as seen by the high accuracy rates generated by the other methods.
    DORSal: Diffusion for Object-centric Representations of Scenes $\textit{et al.}$. (arXiv:2306.08068v1 [cs.CV])
    Recent progress in 3D scene understanding enables scalable learning of representations across large datasets of diverse scenes. As a consequence, generalization to unseen scenes and objects, rendering novel views from just a single or a handful of input images, and controllable scene generation that supports editing, is now possible. However, training jointly on a large number of scenes typically compromises rendering quality when compared to single-scene optimized models such as NeRFs. In this paper, we leverage recent progress in diffusion models to equip 3D scene representation learning models with the ability to render high-fidelity novel views, while retaining benefits such as object-level scene editing to a large degree. In particular, we propose DORSal, which adapts a video diffusion architecture for 3D scene generation conditioned on object-centric slot-based representations of scenes. On both complex synthetic multi-object scenes and on the real-world large-scale Street View dataset, we show that DORSal enables scalable neural rendering of 3D scenes with object-level editing and improves upon existing approaches.
    Chainlet Orbits: Topological Address Embedding for the Bitcoin Blockchain. (arXiv:2306.07974v1 [cs.CR])
    The rise of cryptocurrencies like Bitcoin, which enable transactions with a degree of pseudonymity, has led to a surge in various illicit activities, including ransomware payments and transactions on darknet markets. These illegal activities often utilize Bitcoin as the preferred payment method. However, current tools for detecting illicit behavior either rely on a few heuristics and laborious data collection processes or employ computationally inefficient graph neural network (GNN) models that are challenging to interpret. To overcome the computational and interpretability limitations of existing techniques, we introduce an effective solution called Chainlet Orbits. This approach embeds Bitcoin addresses by leveraging their topological characteristics in transactions. By employing our innovative address embedding, we investigate e-crime in Bitcoin networks by focusing on distinctive substructures that arise from illicit behavior. The results of our node classification experiments demonstrate superior performance compared to state-of-the-art methods, including both topological and GNN-based approaches. Moreover, our approach enables the use of interpretable and explainable machine learning models in as little as 15 minutes for most days on the Bitcoin transaction network.
    Beyond Black Box AI-Generated Plagiarism Detection: From Sentence to Document Level. (arXiv:2306.08122v1 [cs.CL])
    The increasing reliance on large language models (LLMs) in academic writing has led to a rise in plagiarism. Existing AI-generated text classifiers have limited accuracy and often produce false positives. We propose a novel approach using natural language processing (NLP) techniques, offering quantifiable metrics at both sentence and document levels for easier interpretation by human evaluators. Our method employs a multi-faceted approach, generating multiple paraphrased versions of a given question and inputting them into the LLM to generate answers. By using a contrastive loss function based on cosine similarity, we match generated sentences with those from the student's response. Our approach achieves up to 94% accuracy in classifying human and AI text, providing a robust and adaptable solution for plagiarism detection in academic settings. This method improves with LLM advancements, reducing the need for new model training or reconfiguration, and offers a more transparent way of evaluating and detecting AI-generated text.
    Unbiased Learning of Deep Generative Models with Structured Discrete Representations. (arXiv:2306.08230v1 [cs.LG])
    By composing graphical models with deep learning architectures, we learn generative models with the strengths of both frameworks. The structured variational autoencoder (SVAE) inherits structure and interpretability from graphical models, and flexible likelihoods for high-dimensional data from deep learning, but poses substantial optimization challenges. We propose novel algorithms for learning SVAEs, and are the first to demonstrate the SVAE's ability to handle multimodal uncertainty when data is missing by incorporating discrete latent variables. Our memory-efficient implicit differentiation scheme makes the SVAE tractable to learn via gradient descent, while demonstrating robustness to incomplete optimization. To more rapidly learn accurate graphical model parameters, we derive a method for computing natural gradients without manual derivations, which avoids biases found in prior work. These optimization innovations enable the first comparisons of the SVAE to state-of-the-art time series models, where the SVAE performs competitively while learning interpretable and structured discrete data representations.
    Model-Free Market Risk Hedging Using Crowding Networks. (arXiv:2306.08105v1 [q-fin.PM])
    Crowding is widely regarded as one of the most important risk factors in designing portfolio strategies. In this paper, we analyze stock crowding using network analysis of fund holdings, which is used to compute crowding scores for stocks. These scores are used to construct costless long-short portfolios, computed in a distribution-free (model-free) way and without using any numerical optimization, with desirable properties of hedge portfolios. More specifically, these long-short portfolios provide protection for both small and large market price fluctuations, due to their negative correlation with the market and positive convexity as a function of market returns. By adding our long-short portfolio to a baseline portfolio such as a traditional 60/40 portfolio, our method provides an alternative way to hedge portfolio risk including tail risk, which does not require costly option-based strategies or complex numerical optimization. The total cost of such hedging amounts to the total cost of rebalancing the hedge portfolio.
    Symbolic Regression via Control Variable Genetic Programming. (arXiv:2306.08057v1 [cs.NE])
    Learning symbolic expressions directly from experiment data is a vital step in AI-driven scientific discovery. Nevertheless, state-of-the-art approaches are limited to learning simple expressions. Regressing expressions involving many independent variables still remain out of reach. Motivated by the control variable experiments widely utilized in science, we propose Control Variable Genetic Programming (CVGP) for symbolic regression over many independent variables. CVGP expedites symbolic expression discovery via customized experiment design, rather than learning from a fixed dataset collected a priori. CVGP starts by fitting simple expressions involving a small set of independent variables using genetic programming, under controlled experiments where other variables are held as constants. It then extends expressions learned in previous generations by adding new independent variables, using new control variable experiments in which these variables are allowed to vary. Theoretically, we show CVGP as an incremental building approach can yield an exponential reduction in the search space when learning a class of expressions. Experimentally, CVGP outperforms several baselines in learning symbolic expressions involving multiple independent variables.
    AutoML in the Age of Large Language Models: Current Challenges, Future Opportunities and Risks. (arXiv:2306.08107v1 [cs.LG])
    The fields of both Natural Language Processing (NLP) and Automated Machine Learning (AutoML) have achieved remarkable results over the past years. In NLP, especially Large Language Models (LLMs) have experienced a rapid series of breakthroughs very recently. We envision that the two fields can radically push the boundaries of each other through tight integration. To showcase this vision, we explore the potential of a symbiotic relationship between AutoML and LLMs, shedding light on how they can benefit each other. In particular, we investigate both the opportunities to enhance AutoML approaches with LLMs from different perspectives and the challenges of leveraging AutoML to further improve LLMs. To this end, we survey existing work, and we critically assess risks. We strongly believe that the integration of the two fields has the potential to disrupt both fields, NLP and AutoML. By highlighting conceivable synergies, but also risks, we aim to foster further exploration at the intersection of AutoML and LLMs.
    On Faking a Nash Equilibrium. (arXiv:2306.08041v1 [cs.MA])
    We characterize offline data poisoning attacks on Multi-Agent Reinforcement Learning (MARL), where an attacker may change a data set in an attempt to install a (potentially fictitious) unique Markov-perfect Nash equilibrium. We propose the unique Nash set, namely the set of games, specified by their Q functions, with a specific joint policy being the unique Nash equilibrium. The unique Nash set is central to poisoning attacks because the attack is successful if and only if data poisoning pushes all plausible games inside it. The unique Nash set generalizes the reward polytope commonly used in inverse reinforcement learning to MARL. For zero-sum Markov games, both the inverse Nash set and the set of plausible games induced by data are polytopes in the Q function space. We exhibit a linear program to efficiently compute the optimal poisoning attack. Our work sheds light on the structure of data poisoning attacks on offline MARL, a necessary step before one can design more robust MARL algorithms.
    Improving Zero-Shot Detection of Low Prevalence Chest Pathologies using Domain Pre-trained Language Models. (arXiv:2306.08000v1 [physics.med-ph])
    Recent advances in zero-shot learning have enabled the use of paired image-text data to replace structured labels, replacing the need for expert annotated datasets. Models such as CLIP-based CheXzero utilize these advancements in the domain of chest X-ray interpretation. We hypothesize that domain pre-trained models such as CXR-BERT, BlueBERT, and ClinicalBERT offer the potential to improve the performance of CLIP-like models with specific domain knowledge by replacing BERT weights at the cost of breaking the original model's alignment. We evaluate the performance of zero-shot classification models with domain-specific pre-training for detecting low-prevalence pathologies. Even though replacing the weights of the original CLIP-BERT degrades model performance on commonly found pathologies, we show that pre-trained text towers perform exceptionally better on low-prevalence diseases. This motivates future ensemble models with a combination of differently trained language models for maximal performance.
    Diffusion-based Conditional ECG Generation with Structured State Space Models. (arXiv:2301.08227v2 [eess.SP] UPDATED)
    Synthetic data generation is a promising solution to address privacy issues with the distribution of sensitive health data. Recently, diffusion models have set new standards for generative models for different data modalities. Also very recently, structured state space models emerged as a powerful modeling paradigm to capture long-term dependencies in time series. We put forward SSSD-ECG, as the combination of these two technologies, for the generation of synthetic 12-lead electrocardiograms conditioned on more than 70 ECG statements. Due to a lack of reliable baselines, we also propose conditional variants of two state-of-the-art unconditional generative models. We thoroughly evaluate the quality of the generated samples, by evaluating pretrained classifiers on the generated data and by evaluating the performance of a classifier trained only on synthetic data, where SSSD-ECG clearly outperforms its GAN-based competitors. We demonstrate the soundness of our approach through further experiments, including conditional class interpolation and a clinical Turing test demonstrating the high quality of the SSSD-ECG samples across a wide range of conditions.
    Improving Training Stability for Multitask Ranking Models in Recommender Systems. (arXiv:2302.09178v2 [cs.LG] UPDATED)
    Recommender systems play an important role in many content platforms. While most recommendation research is dedicated to designing better models to improve user experience, we found that research on stabilizing the training for such models is severely under-explored. As recommendation models become larger and more sophisticated, they are more susceptible to training instability issues, i.e., loss divergence, which can make the model unusable, waste significant resources and block model developments. In this paper, we share our findings and best practices we learned for improving the training stability of a real-world multitask ranking model for YouTube recommendations. We show some properties of the model that lead to unstable training and conjecture on the causes. Furthermore, based on our observations of training dynamics near the point of training instability, we hypothesize why existing solutions would fail, and propose a new algorithm to mitigate the limitations of existing solutions. Our experiments on YouTube production dataset show the proposed algorithm can significantly improve training stability while not compromising convergence, comparing with several commonly used baseline methods.
    LightGCL: Simple Yet Effective Graph Contrastive Learning for Recommendation. (arXiv:2302.08191v3 [cs.IR] UPDATED)
    Graph neural network (GNN) is a powerful learning approach for graph-based recommender systems. Recently, GNNs integrated with contrastive learning have shown superior performance in recommendation with their data augmentation schemes, aiming at dealing with highly sparse data. Despite their success, most existing graph contrastive learning methods either perform stochastic augmentation (e.g., node/edge perturbation) on the user-item interaction graph, or rely on the heuristic-based augmentation techniques (e.g., user clustering) for generating contrastive views. We argue that these methods cannot well preserve the intrinsic semantic structures and are easily biased by the noise perturbation. In this paper, we propose a simple yet effective graph contrastive learning paradigm LightGCL that mitigates these issues impairing the generality and robustness of CL-based recommenders. Our model exclusively utilizes singular value decomposition for contrastive augmentation, which enables the unconstrained structural refinement with global collaborative relation modeling. Experiments conducted on several benchmark datasets demonstrate the significant improvement in performance of our model over the state-of-the-arts. Further analyses demonstrate the superiority of LightGCL's robustness against data sparsity and popularity bias. The source code of our model is available at https://github.com/HKUDS/LightGCL.
    Subject Granular Differential Privacy in Federated Learning. (arXiv:2206.03617v2 [cs.LG] UPDATED)
    This paper considers subject level privacy in the FL setting, where a subject is an individual whose private information is embodied by several data items either confined within a single federation user or distributed across multiple federation users. We propose two new algorithms that enforce subject level DP at each federation user locally. Our first algorithm, called LocalGroupDP, is a straightforward application of group differential privacy in the popular DP-SGD algorithm. Our second algorithm is based on a novel idea of hierarchical gradient averaging (HiGradAvgDP) for subjects participating in a training mini-batch. We also show that user level Local Differential Privacy (LDP) naturally guarantees subject level DP. We observe the problem of horizontal composition of subject level privacy loss in FL - subject level privacy loss incurred at individual users composes across the federation. We formally prove the subject level DP guarantee for our algorithms, and also show their effect on model utility loss. Our empirical evaluation on FEMNIST and Shakespeare datasets shows that LocalGroupDP delivers the best performance among our algorithms. However, its model utility lags behind that of models trained using a DP-SGD based algorithm that provides a weaker item level privacy guarantee. Privacy loss amplification due to subject sampling fractions and horizontal composition remain key challenges for model utility.  ( 2 min )
    Revisiting the Gumbel-Softmax in MADDPG. (arXiv:2302.11793v2 [cs.LG] UPDATED)
    MADDPG is an algorithm in multi-agent reinforcement learning (MARL) that extends the popular single-agent method, DDPG, to multi-agent scenarios. Importantly, DDPG is an algorithm designed for continuous action spaces, where the gradient of the state-action value function exists. For this algorithm to work in discrete action spaces, discrete gradient estimation must be performed. For MADDPG, the Gumbel-Softmax (GS) estimator is used -- a reparameterisation which relaxes a discrete distribution into a similar continuous one. This method, however, is statistically biased, and a recent MARL benchmarking paper suggests that this bias makes MADDPG perform poorly in grid-world situations, where the action space is discrete. Fortunately, many alternatives to the GS exist, boasting a wide range of properties. This paper explores several of these alternatives and integrates them into MADDPG for discrete grid-world scenarios. The corresponding impact on various performance metrics is then measured and analysed. It is found that one of the proposed estimators performs significantly better than the original GS in several tasks, achieving up to 55% higher returns, along with faster convergence.
    Multivariate Systemic Risk Measures and Computation by Deep Learning Algorithms. (arXiv:2302.10183v2 [cs.LG] UPDATED)
    In this work we propose deep learning-based algorithms for the computation of systemic shortfall risk measures defined via multivariate utility functions. We discuss the key related theoretical aspects, with a particular focus on the fairness properties of primal optima and associated risk allocations. The algorithms we provide allow for learning primal optimizers, optima for the dual representation and corresponding fair risk allocations. We test our algorithms by comparison to a benchmark model, based on a paired exponential utility function, for which we can provide explicit formulas. We also show evidence of convergence in a case for which explicit formulas are not available.
    Optimistic Planning by Regularized Dynamic Programming. (arXiv:2302.14004v3 [cs.LG] UPDATED)
    We propose a new method for optimistic planning in infinite-horizon discounted Markov decision processes based on the idea of adding regularization to the updates of an otherwise standard approximate value iteration procedure. This technique allows us to avoid contraction and monotonicity arguments typically required by existing analyses of approximate dynamic programming methods, and in particular to use approximate transition functions estimated via least-squares procedures in MDPs with linear function approximation. We use our method to recover known guarantees in tabular MDPs and to provide a computationally efficient algorithm for learning near-optimal policies in discounted linear mixture MDPs from a single stream of experience, and show it achieves near-optimal statistical guarantees.
    Surgical Aggregation: A Collaborative Learning Framework for Harmonizing Distributed Medical Imaging Datasets with Diverse Tasks. (arXiv:2301.06683v4 [cs.CV] UPDATED)
    Large-scale chest x-ray datasets have been curated for the detection of abnormalities using deep learning, with the potential to provide substantial benefits across many clinical applications. However, each dataset focuses only on a subset of findings that can be simultaneously present in a patient, making it challenging to train models that aggregate multiple datasets together. Therefore, data harmonization is crucial to leverage these datasets in aggregate to train clinically useful models with a complete representation of abnormalities that may occur within the thorax. To that end, we propose surgical aggregation, a collaborative learning framework for harmonizing and aggregating knowledge from distributed heterogeneous datasets with partial annotations. We evaluate surgical aggregation across synthetic and real-world heterogeneous datasets with partial annotations. Our results indicate that surgical aggregation outperforms current strategies, generalizes better, and has the potential to facilitate the development of clinically useful models even when using datasets with heterogeneous disease labels.
    Debiasing Recommendation by Learning Identifiable Latent Confounders. (arXiv:2302.05052v2 [cs.LG] UPDATED)
    Recommendation systems aim to predict users' feedback on items not exposed to them. Confounding bias arises due to the presence of unmeasured variables (e.g., the socio-economic status of a user) that can affect both a user's exposure and feedback. Existing methods either (1) make untenable assumptions about these unmeasured variables or (2) directly infer latent confounders from users' exposure. However, they cannot guarantee the identification of counterfactual feedback, which can lead to biased predictions. In this work, we propose a novel method, i.e., identifiable deconfounder (iDCF), which leverages a set of proxy variables (e.g., observed user features) to resolve the aforementioned non-identification issue. The proposed iDCF is a general deconfounded recommendation framework that applies proximal causal inference to infer the unmeasured confounders and identify the counterfactual feedback with theoretical guarantees. Extensive experiments on various real-world and synthetic datasets verify the proposed method's effectiveness and robustness.
    Bandit Social Learning: Exploration under Myopic Behavior. (arXiv:2302.07425v3 [cs.GT] UPDATED)
    We study social learning dynamics where the agents collectively follow a simple multi-armed bandit protocol. Agents arrive sequentially, choose arms and receive associated rewards. Each agent observes the full history (arms and rewards) of the previous agents, and there are no private signals. While collectively the agents face exploration-exploitation tradeoff, each agent acts myopically, without regards to exploration. Motivating scenarios concern reviews and ratings on online platforms. We allow a wide range of myopic behaviors that are consistent with (parameterized) confidence intervals, including the "unbiased" behavior as well as various behaviorial biases. While extreme versions of these behaviors correspond to well-known bandit algorithms, we prove that more moderate versions lead to stark exploration failures, and consequently to regret rates that are linear in the number of agents. We provide matching upper bounds on regret by analyzing "moderately optimistic" agents. As a special case of independent interest, we obtain a general result on failure of the greedy algorithm in multi-armed bandits. This is the first such result in the literature, to the best of our knowledge.
    Tackling Shortcut Learning in Deep Neural Networks: An Iterative Approach with Interpretable Models. (arXiv:2302.10289v6 [cs.LG] UPDATED)
    We use concept-based interpretable models to mitigate shortcut learning. Existing methods lack interpretability. Beginning with a Blackbox, we iteratively \emph{carve out} a mixture of interpretable experts (MoIE) and a \emph{residual network}. Each expert explains a subset of data using First Order Logic (FOL). While explaining a sample, the FOL from biased BB-derived MoIE detects the shortcut effectively. Finetuning the BB with Metadata Normalization (MDN) eliminates the shortcut. The FOLs from the finetuned-BB-derived MoIE verify the elimination of the shortcut. Our experiments show that MoIE does not hurt the accuracy of the original BB and eliminates shortcuts effectively.
    Learning Manifold Dimensions with Conditional Variational Autoencoders. (arXiv:2302.11756v2 [cs.LG] UPDATED)
    Although the variational autoencoder (VAE) and its conditional extension (CVAE) are capable of state-of-the-art results across multiple domains, their precise behavior is still not fully understood, particularly in the context of data (like images) that lie on or near a low-dimensional manifold. For example, while prior work has suggested that the globally optimal VAE solution can learn the correct manifold dimension, a necessary (but not sufficient) condition for producing samples from the true data distribution, this has never been rigorously proven. Moreover, it remains unclear how such considerations would change when various types of conditioning variables are introduced, or when the data support is extended to a union of manifolds (e.g., as is likely the case for MNIST digits and related). In this work, we address these points by first proving that VAE global minima are indeed capable of recovering the correct manifold dimension. We then extend this result to more general CVAEs, demonstrating practical scenarios whereby the conditioning variables allow the model to adaptively learn manifolds of varying dimension across samples. Our analyses, which have practical implications for various CVAE design choices, are also supported by numerical results on both synthetic and real-world datasets.
    SOBER: Highly Parallel Bayesian Optimization and Bayesian Quadrature over Discrete and Mixed Spaces. (arXiv:2301.11832v3 [cs.LG] UPDATED)
    Batch Bayesian optimisation and Bayesian quadrature have been shown to be sample-efficient methods of performing optimisation and quadrature where expensive-to-evaluate objective functions can be queried in parallel. However, current methods do not scale to large batch sizes -- a frequent desideratum in practice (e.g. drug discovery or simulation-based inference). We present a novel algorithm, SOBER, which permits scalable and diversified batch global optimisation and quadrature with arbitrary acquisition functions and kernels over discrete and mixed spaces. The key to our approach is to reformulate batch selection for global optimisation as a quadrature problem, which relaxes acquisition function maximisation (non-convex) to kernel recombination (convex). Bridging global optimisation and quadrature can efficiently solve both tasks by balancing the merits of exploitative Bayesian optimisation and explorative Bayesian quadrature. We show that SOBER outperforms 11 competitive baselines on 12 synthetic and diverse real-world tasks.
    Continual Vision-based Reinforcement Learning with Group Symmetries. (arXiv:2210.12301v2 [cs.LG] UPDATED)
    Continual reinforcement learning aims to sequentially learn a variety of tasks, retaining the ability to perform previously encountered tasks while simultaneously developing new policies for novel tasks. However, current continual RL approaches overlook the fact that certain tasks are identical under basic group operations like rotations or translations, especially with visual inputs. They may unnecessarily learn and maintain a new policy for each similar task, leading to poor sample efficiency and weak generalization capability. To address this, we introduce a unique Continual Vision-based Reinforcement Learning method that recognizes Group Symmetries, called COVERS, cultivating a policy for each group of equivalent tasks rather than individual tasks. COVERS employs a proximal policy optimization-based RL algorithm with an equivariant feature extractor and a novel task grouping mechanism that relies on the extracted invariant features. We evaluate COVERS on sequences of table-top manipulation tasks that incorporate image observations and robot proprioceptive information in both simulations and on real robot platforms. Our results show that COVERS accurately assigns tasks to their respective groups and significantly outperforms existing methods in terms of generalization capability.  ( 2 min )
    Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language. (arXiv:2212.07525v2 [cs.LG] UPDATED)
    Current self-supervised learning algorithms are often modality-specific and require large amounts of computational resources. To address these issues, we increase the training efficiency of data2vec, a learning objective that generalizes across several modalities. We do not encode masked tokens, use a fast convolutional decoder and amortize the effort to build teacher representations. data2vec 2.0 benefits from the rich contextualized target representations introduced in data2vec which enable a fast self-supervised learner. Experiments on ImageNet-1K image classification show that data2vec 2.0 matches the accuracy of Masked Autoencoders in 16.4x lower pre-training time, on Librispeech speech recognition it performs as well as wav2vec 2.0 in 10.6x less time, and on GLUE natural language understanding it matches a retrained RoBERTa model in half the time. Trading some speed for accuracy results in ImageNet-1K top-1 accuracy of 86.8\% with a ViT-L model trained for 150 epochs.
    Stop overkilling simple tasks with black-box models and use transparent models instead. (arXiv:2302.02804v2 [cs.LG] UPDATED)
    In recent years, the employment of deep learning methods has led to several significant breakthroughs in artificial intelligence. Different from traditional machine learning models, deep learning-based approaches are able to extract features autonomously from raw data. This allows for bypassing the feature engineering process, which is generally considered to be both error-prone and tedious. Moreover, deep learning strategies often outperform traditional models in terms of accuracy.
    Compositional Scene Representation Learning via Reconstruction: A Survey. (arXiv:2202.07135v4 [cs.LG] UPDATED)
    Visual scenes are composed of visual concepts and have the property of combinatorial explosion. An important reason for humans to efficiently learn from diverse visual scenes is the ability of compositional perception, and it is desirable for artificial intelligence to have similar abilities. Compositional scene representation learning is a task that enables such abilities. In recent years, various methods have been proposed to apply deep neural networks, which have been proven to be advantageous in representation learning, to learn compositional scene representations via reconstruction, advancing this research direction into the deep learning era. Learning via reconstruction is advantageous because it may utilize massive unlabeled data and avoid costly and laborious data annotation. In this survey, we first outline the current progress on reconstruction-based compositional scene representation learning with deep neural networks, including development history and categorizations of existing methods from the perspectives of the modeling of visual scenes and the inference of scene representations; then provide benchmarks, including an open source toolbox to reproduce the benchmark experiments, of representative methods that consider the most extensively studied problem setting and form the foundation for other methods; and finally discuss the limitations of existing methods and future directions of this research topic.  ( 2 min )
    From Human Days to Machine Seconds: Automatically Answering and Generating Machine Learning Final Exams. (arXiv:2206.05442v6 [cs.LG] UPDATED)
    A final exam in machine learning at a top institution such as MIT, Harvard, or Cornell typically takes faculty days to write, and students hours to solve. We demonstrate that large language models pass machine learning finals at a human level on a corpus drawn from MIT, Harvard, and Cornell and automatically generate new human-quality final exam questions in seconds. Previous work has developed program synthesis and few-shot learning methods to solve university-level problem set questions in mathematics and STEM courses. In this work, we develop and compare methods that solve final exams, which differ from problem sets in several ways: the questions are longer, have multiple parts, are more complicated, and span a broader set of topics. We provide a new dataset and benchmark of questions from machine learning final exams and code for answering these questions and generating new questions. We show how to generate new questions from other questions and course notes. We evaluate a large open language model, Meta's OPT, and compare the results with OpenAI's closed models. A student survey comparing the quality, appropriateness, and difficulty of machine-generated questions with human-written questions shows that across multiple aspects, machine-generated questions are indistinguishable from human-generated questions and are suitable for final exams. We perform ablation studies comparing zero-shot learning with few-shot learning and chain-of-thought prompting using GPT-3, OPT, Codex, and ChatGPT across machine learning topics and find that few-shot learning methods perform best. We highlight the transformative potential of language models to streamline the writing and solution of large-scale assessments, significantly reducing the workload from human days to machine seconds.  ( 3 min )
    Adversarial Cheap Talk. (arXiv:2211.11030v2 [cs.LG] UPDATED)
    Adversarial attacks in reinforcement learning (RL) often assume highly-privileged access to the victim's parameters, environment, or data. Instead, this paper proposes a novel adversarial setting called a Cheap Talk MDP in which an Adversary can merely append deterministic messages to the Victim's observation, resulting in a minimal range of influence. The Adversary cannot occlude ground truth, influence underlying environment dynamics or reward signals, introduce non-stationarity, add stochasticity, see the Victim's actions, or access their parameters. Additionally, we present a simple meta-learning algorithm called Adversarial Cheap Talk (ACT) to train Adversaries in this setting. We demonstrate that an Adversary trained with ACT still significantly influences the Victim's training and testing performance, despite the highly constrained setting. Affecting train-time performance reveals a new attack vector and provides insight into the success and failure modes of existing RL algorithms. More specifically, we show that an ACT Adversary is capable of harming performance by interfering with the learner's function approximation, or instead helping the Victim's performance by outputting useful features. Finally, we show that an ACT Adversary can manipulate messages during train-time to directly and arbitrarily control the Victim at test-time. Project video and code are available at https://sites.google.com/view/adversarial-cheap-talk  ( 2 min )
    Early heart disease prediction using hybrid quantum classification. (arXiv:2208.08882v2 [quant-ph] UPDATED)
    The rate of heart morbidity and heart mortality increases significantly which affect the global public health and world economy. Early prediction of heart disease is crucial for reducing heart morbidity and mortality. This paper proposes two quantum machine learning methods i.e. hybrid quantum neural network and hybrid random forest quantum neural network for early detection of heart disease. The methods are applied on the Cleveland and Statlog datasets. The results show that hybrid quantum neural network and hybrid random forest quantum neural network are suitable for high dimensional and low dimensional problems respectively. The hybrid quantum neural network is sensitive to outlier data while hybrid random forest is robust on outlier data. A comparison between different machine learning methods shows that the proposed quantum methods are more appropriate for early heart disease prediction where 96.43% and 97.78% area under curve are obtained for Cleveland and Statlog dataset respectively.  ( 2 min )
    AdaProp: Learning Adaptive Propagation for Graph Neural Network based Knowledge Graph Reasoning. (arXiv:2205.15319v2 [cs.LG] UPDATED)
    Due to the popularity of Graph Neural Networks (GNNs), various GNN-based methods have been designed to reason on knowledge graphs (KGs). An important design component of GNN-based KG reasoning methods is called the propagation path, which contains a set of involved entities in each propagation step. Existing methods use hand-designed propagation paths, ignoring the correlation between the entities and the query relation. In addition, the number of involved entities will explosively grow at larger propagation steps. In this work, we are motivated to learn an adaptive propagation path in order to filter out irrelevant entities while preserving promising targets. First, we design an incremental sampling mechanism where the nearby targets and layer-wise connections can be preserved with linear complexity. Second, we design a learning-based sampling distribution to identify the semantically related entities. Extensive experiments show that our method is powerful, efficient, and semantic-aware. The code is available at https://github.com/LARS-research/AdaProp.  ( 2 min )
    On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning. (arXiv:2210.10763v2 [cs.LG] UPDATED)
    Reinforcement Learning (RL) algorithms can solve challenging control problems directly from image observations, but they often require millions of environment interactions to do so. Recently, model-based RL algorithms have greatly improved sample-efficiency by concurrently learning an internal model of the world, and supplementing real environment interactions with imagined rollouts for policy improvement. However, learning an effective model of the world from scratch is challenging, and in stark contrast to humans that rely heavily on world understanding and visual cues for learning new skills. In this work, we investigate whether internal models learned by modern model-based RL algorithms can be leveraged to solve new, distinctly different tasks faster. We propose Model-Based Cross-Task Transfer (XTRA), a framework for sample-efficient online RL with scalable pretraining and finetuning of learned world models. By offline multi-task pretraining and online cross-task finetuning, we achieve substantial improvements over a baseline trained from scratch; we improve mean performance of model-based algorithm EfficientZero by 23%, and by as much as 71% in some instances.  ( 2 min )
    Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adapters. (arXiv:2211.11031v3 [cs.LG] UPDATED)
    Deployed models decay over time due to shifting inputs, changing user needs, or emergent knowledge gaps. When harmful behaviors are identified, targeted edits are required. However, current model editors, which adjust specific behaviors of pre-trained models, degrade model performance over multiple edits. We propose GRACE, a Lifelong Model Editing method, which implements spot-fixes on streaming errors of a deployed model, ensuring minimal impact on unrelated inputs. GRACE writes new mappings into a pre-trained model's latent space, creating a discrete, local codebook of edits without altering model weights. This is the first method enabling thousands of sequential edits using only streaming errors. Our experiments on T5, BERT, and GPT models show GRACE's state-of-the-art performance in making and retaining edits, while generalizing to unseen inputs. Our code is available at https://www.github.com/thartvigsen/grace}{github.com/thartvigsen/grace}.
    Physics-informed neural networks for gravity currents reconstruction from limited data. (arXiv:2211.09715v2 [physics.flu-dyn] UPDATED)
    The present work investigates the use of physics-informed neural networks (PINNs) for the 3D reconstruction of unsteady gravity currents from limited data. In the PINN context, the flow fields are reconstructed by training a neural network whose objective function penalizes the mismatch between the network predictions and the observed data and embeds the underlying equations using automatic differentiation. This study relies on a high-fidelity numerical experiment of the canonical lock-exchange configuration. This allows us to benchmark quantitatively the PINNs reconstruction capabilities on several training databases that mimic state-of-the-art experimental measurement techniques for density and velocity. Notably, spatially averaged density measurements by light attenuation technique (LAT) are employed for the training procedure. An optimal experimental setup for flow reconstruction by PINNs is proposed according to two criteria : the implementation complexity and the accuracy of the inferred fields.
    Investigating the Impact of Model Width and Density on Generalization in Presence of Label Noise. (arXiv:2208.08003v4 [cs.LG] UPDATED)
    Increasing the size of overparameterized neural networks has been a key in achieving state-of-the-art performance. This is captured by the double descent phenomenon, where the test loss follows a decreasing-increasing-decreasing pattern as model width increases. However, the effect of label noise on the test loss curve has not been fully explored. In this work, we uncover an intriguing phenomenon where label noise leads to a \textit{final ascent} in the originally observed double descent curve. Specifically, under a sufficiently large noise-to-sample-size ratio, optimal generalization is achieved at intermediate widths. Through theoretical analysis, we attribute this phenomenon to the shape transition of test loss variance induced by label noise. Furthermore, we extend the final ascent phenomenon to model density and provide the first theoretical characterization showing that reducing density by randomly dropping trainable parameters improves generalization under label noise. We also thoroughly examine the roles of regularization and sample size. Surprisingly, we find that larger $\ell_2$ regularization and robust learning methods against label noise exacerbate the final ascent. We confirm the validity of our findings through extensive experiments on ReLu networks trained on MNIST, ResNets trained on CIFAR-10/100, and InceptionResNet-v2 trained on Stanford Cars with real-world noisy labels.  ( 3 min )
    A Theoretical Framework for AI Models Explainability with Application in Biomedicine. (arXiv:2212.14447v4 [cs.AI] UPDATED)
    EXplainable Artificial Intelligence (XAI) is a vibrant research topic in the artificial intelligence community, with growing interest across methods and domains. Much has been written about the subject, yet XAI still lacks shared terminology and a framework capable of providing structural soundness to explanations. In our work, we address these issues by proposing a novel definition of explanation that is a synthesis of what can be found in the literature. We recognize that explanations are not atomic but the combination of evidence stemming from the model and its input-output mapping, and the human interpretation of this evidence. Furthermore, we fit explanations into the properties of faithfulness (i.e., the explanation being a true description of the model's inner workings and decision-making process) and plausibility (i.e., how much the explanation looks convincing to the user). Using our proposed theoretical framework simplifies how these properties are operationalized and it provides new insight into common explanation methods that we analyze as case studies.
    GREAD: Graph Neural Reaction-Diffusion Networks. (arXiv:2211.14208v3 [cs.LG] UPDATED)
    Graph neural networks (GNNs) are one of the most popular research topics for deep learning. GNN methods typically have been designed on top of the graph signal processing theory. In particular, diffusion equations have been widely used for designing the core processing layer of GNNs, and therefore they are inevitably vulnerable to the notorious oversmoothing problem. Recently, a couple of papers paid attention to reaction equations in conjunctions with diffusion equations. However, they all consider limited forms of reaction equations. To this end, we present a reaction-diffusion equation-based GNN method that considers all popular types of reaction equations in addition to one special reaction equation designed by us. To our knowledge, our paper is one of the most comprehensive studies on reaction-diffusion equation-based GNNs. In our experiments with 9 datasets and 28 baselines, our method, called GREAD, outperforms them in a majority of cases. Further synthetic data experiments show that it mitigates the oversmoothing problem and works well for various homophily rates.
    Benchmarking simulated and physical quantum processing units using quantum and hybrid algorithms. (arXiv:2211.15631v2 [quant-ph] UPDATED)
    Powerful hardware services and software libraries are vital tools for quickly and affordably designing, testing, and executing quantum algorithms. A robust large-scale study of how the performance of these platforms scales with the number of qubits is key to providing quantum solutions to challenging industry problems. This work benchmarks the runtime and accuracy for a representative sample of specialized high-performance simulated and physical quantum processing units. Results show the QMware simulator can reduce the runtime for executing a quantum circuit by up to 78% compared to the next fastest option for algorithms with fewer than 27 qubits. The AWS SV1 simulator offers a runtime advantage for larger circuits, up to the maximum 34 qubits available with SV1. Beyond this limit, QMware can execute circuits as large as 40 qubits. Physical quantum devices, such as Rigetti's Aspen-M2, can provide an exponential runtime advantage for circuits with more than 30 qubits. However, the high financial cost of physical quantum processing units presents a serious barrier to practical use. Moreover, only IonQ's Harmony quantum device achieves high fidelity with more than four qubits. This study paves the way to understanding the optimal combination of available software and hardware for executing practical quantum algorithms.  ( 3 min )
    TIDE: Time Derivative Diffusion for Deep Learning on Graphs. (arXiv:2212.02483v2 [cs.LG] UPDATED)
    A prominent paradigm for graph neural networks is based on the message-passing framework. In this framework, information communication is realized only between neighboring nodes. The challenge of approaches that use this paradigm is to ensure efficient and accurate long-distance communication between nodes, as deep convolutional networks are prone to oversmoothing. In this paper, we present a novel method based on time derivative graph diffusion (TIDE) to overcome these structural limitations of the message-passing framework. Our approach allows for optimizing the spatial extent of diffusion across various tasks and network channels, thus enabling medium and long-distance communication efficiently. Furthermore, we show that our architecture design also enables local message-passing and thus inherits from the capabilities of local message-passing approaches. We show that on both widely used graph benchmarks and synthetic mesh and graph datasets, the proposed framework outperforms state-of-the-art methods by a significant margin
    HiveNAS: Neural Architecture Search using Artificial Bee Colony Optimization. (arXiv:2211.10250v2 [cs.NE] UPDATED)
    The traditional Neural Network-development process requires substantial expert knowledge and relies heavily on intuition and trial-and-error. Neural Architecture Search (NAS) frameworks were introduced to robustly search for network topologies, as well as facilitate the automated development of Neural Networks. While some optimization approaches -- such as Genetic Algorithms -- have been extensively explored in the NAS context, other Metaheuristic Optimization algorithms have not yet been investigated. In this study, we evaluate the viability of Artificial Bee Colony optimization for Neural Architecture Search. Our proposed framework, HiveNAS, outperforms existing state-of-the-art Swarm Intelligence-based NAS frameworks in a fraction of the time.
    Neuroevolution is a Competitive Alternative to Reinforcement Learning for Skill Discovery. (arXiv:2210.03516v3 [cs.NE] UPDATED)
    Deep Reinforcement Learning (RL) has emerged as a powerful paradigm for training neural policies to solve complex control tasks. However, these policies tend to be overfit to the exact specifications of the task and environment they were trained on, and thus do not perform well when conditions deviate slightly or when composed hierarchically to solve even more complex tasks. Recent work has shown that training a mixture of policies, as opposed to a single one, that are driven to explore different regions of the state-action space can address this shortcoming by generating a diverse set of behaviors, referred to as skills, that can be collectively used to great effect in adaptation tasks or for hierarchical planning. This is typically realized by including a diversity term - often derived from information theory - in the objective function optimized by RL. However these approaches often require careful hyperparameter tuning to be effective. In this work, we demonstrate that less widely-used neuroevolution methods, specifically Quality Diversity (QD), are a competitive alternative to information-theory-augmented RL for skill discovery. Through an extensive empirical evaluation comparing eight state-of-the-art algorithms (four flagship algorithms from each line of work) on the basis of (i) metrics directly evaluating the skills' diversity, (ii) the skills' performance on adaptation tasks, and (iii) the skills' performance when used as primitives for hierarchical planning; QD methods are found to provide equal, and sometimes improved, performance whilst being less sensitive to hyperparameters and more scalable. As no single method is found to provide near-optimal performance across all environments, there is a rich scope for further research which we support by proposing future directions and providing optimized open-source implementations.
    FP-Diffusion: Improving Score-based Diffusion Models by Enforcing the Underlying Score Fokker-Planck Equation. (arXiv:2210.04296v4 [cs.LG] UPDATED)
    Score-based generative models (SGMs) learn a family of noise-conditional score functions corresponding to the data density perturbed with increasingly large amounts of noise. These perturbed data densities are linked together by the Fokker-Planck equation (FPE), a partial differential equation (PDE) governing the spatial-temporal evolution of a density undergoing a diffusion process. In this work, we derive a corresponding equation called the score FPE that characterizes the noise-conditional scores of the perturbed data densities (i.e., their gradients). Surprisingly, despite the impressive empirical performance, we observe that scores learned through denoising score matching (DSM) fail to fulfill the underlying score FPE, which is an inherent self-consistency property of the ground truth score. We prove that satisfying the score FPE is desirable as it improves the likelihood and the degree of conservativity. Hence, we propose to regularize the DSM objective to enforce satisfaction of the score FPE, and we show the effectiveness of this approach across various datasets.  ( 2 min )
    Automated Scoring for Reading Comprehension via In-context BERT Tuning. (arXiv:2205.09864v2 [cs.LG] UPDATED)
    Automated scoring of open-ended student responses has the potential to significantly reduce human grader effort. Recent advances in automated scoring often leverage textual representations based on pre-trained language models such as BERT and GPT as input to scoring models. Most existing approaches train a separate model for each item/question, which is suitable for scenarios such as essay scoring where items can be quite different from one another. However, these approaches have two limitations: 1) they fail to leverage item linkage for scenarios such as reading comprehension where multiple items may share a reading passage; 2) they are not scalable since storing one model per item becomes difficult when models have a large number of parameters. In this paper, we report our (grand prize-winning) solution to the National Assessment of Education Progress (NAEP) automated scoring challenge for reading comprehension. Our approach, in-context BERT fine-tuning, produces a single shared scoring model for all items with a carefully-designed input structure to provide contextual information on each item. We demonstrate the effectiveness of our approach via local evaluations using the training dataset provided by the challenge. We also discuss the biases, common error types, and limitations of our approach.  ( 2 min )
    Bayesian Fixed-Budget Best-Arm Identification. (arXiv:2211.08572v3 [cs.LG] UPDATED)
    Fixed-budget best-arm identification (BAI) is a bandit problem where the agent maximizes the probability of identifying the optimal arm within a fixed budget of observations. In this work, we study this problem in the Bayesian setting. We propose a Bayesian elimination algorithm and derive an upper bound on its probability of misidentifying the optimal arm. The bound reflects the quality of the prior and is the first distribution-dependent bound in this setting. We prove it using a frequentist-like argument, where we carry the prior through, and then integrate out the bandit instance at the end. We also provide a lower bound on the probability of misidentification in a $2$-armed Bayesian bandit and show that our upper bound (almost) matches it for any budget. Our experiments show that Bayesian elimination is superior to frequentist methods and competitive with the state-of-the-art Bayesian algorithms that have no guarantees in our setting.
    LMD: A Learnable Mask Network to Detect Adversarial Examples for Speaker Verification. (arXiv:2211.00825v2 [eess.AS] UPDATED)
    Although the security of automatic speaker verification (ASV) is seriously threatened by recently emerged adversarial attacks, there have been some countermeasures to alleviate the threat. However, many defense approaches not only require the prior knowledge of the attackers but also possess weak interpretability. To address this issue, in this paper, we propose an attacker-independent and interpretable method, named learnable mask detector (LMD), to separate adversarial examples from the genuine ones. It utilizes score variation as an indicator to detect adversarial examples, where the score variation is the absolute discrepancy between the ASV scores of an original audio recording and its transformed audio synthesized from its masked complex spectrogram. A core component of the score variation detector is to generate the masked spectrogram by a neural network. The neural network needs only genuine examples for training, which makes it an attacker-independent approach. Its interpretability lies that the neural network is trained to minimize the score variation of the targeted ASV, and maximize the number of the masked spectrogram bins of the genuine training examples. Its foundation is based on the observation that, masking out the vast majority of the spectrogram bins with little speaker information will inevitably introduce a large score variation to the adversarial example, and a small score variation to the genuine example. Experimental results with 12 attackers and two representative ASV systems show that our proposed method outperforms five state-of-the-art baselines. The extensive experimental results can also be a benchmark for the detection-based ASV defenses.  ( 3 min )
    How Does Pseudo-Labeling Affect the Generalization Error of the Semi-Supervised Gibbs Algorithm?. (arXiv:2210.08188v2 [cs.IT] UPDATED)
    We provide an exact characterization of the expected generalization error (gen-error) for semi-supervised learning (SSL) with pseudo-labeling via the Gibbs algorithm. The gen-error is expressed in terms of the symmetrized KL information between the output hypothesis, the pseudo-labeled dataset, and the labeled dataset. Distribution-free upper and lower bounds on the gen-error can also be obtained. Our findings offer new insights that the generalization performance of SSL with pseudo-labeling is affected not only by the information between the output hypothesis and input training data but also by the information {\em shared} between the {\em labeled} and {\em pseudo-labeled} data samples. This serves as a guideline to choose an appropriate pseudo-labeling method from a given family of methods. To deepen our understanding, we further explore two examples -- mean estimation and logistic regression. In particular, we analyze how the ratio of the number of unlabeled to labeled data $\lambda$ affects the gen-error under both scenarios. As $\lambda$ increases, the gen-error for mean estimation decreases and then saturates at a value larger than when all the samples are labeled, and the gap can be quantified {\em exactly} with our analysis, and is dependent on the \emph{cross-covariance} between the labeled and pseudo-labeled data samples. For logistic regression, the gen-error and the variance component of the excess risk also decrease as $\lambda$ increases.  ( 2 min )
    Multi-channel Autobidding with Budget and ROI Constraints. (arXiv:2302.01523v3 [cs.GT] UPDATED)
    In digital online advertising, advertisers procure ad impressions simultaneously on multiple platforms, or so-called channels, such as Google Ads, Meta Ads Manager, etc., each of which consists of numerous ad auctions. We study how an advertiser maximizes total conversion (e.g. ad clicks) while satisfying aggregate return-on-investment (ROI) and budget constraints across all channels. In practice, an advertiser does not have control over, and thus cannot globally optimize, which individual ad auctions she participates in for each channel, and instead authorizes a channel to procure impressions on her behalf: the advertiser can only utilize two levers on each channel, namely setting a per-channel budget and per-channel target ROI. In this work, we first analyze the effectiveness of each of these levers for solving the advertiser's global multi-channel problem. We show that when an advertiser only optimizes over per-channel ROIs, her total conversion can be arbitrarily worse than what she could have obtained in the global problem. Further, we show that the advertiser can achieve the global optimal conversion when she only optimizes over per-channel budgets. In light of this finding, under a bandit feedback setting that mimics real-world scenarios where advertisers have limited information on ad auctions in each channels and how channels procure ads, we present an efficient learning algorithm that produces per-channel budgets whose resulting conversion approximates that of the global optimal problem. Finally, we argue that all our results hold for both single-item and multi-item auctions from which channels procure impressions on advertisers' behalf.
    CORL: Research-oriented Deep Offline Reinforcement Learning Library. (arXiv:2210.07105v3 [cs.LG] UPDATED)
    CORL is an open-source library that provides thoroughly benchmarked single-file implementations of both deep offline and offline-to-online reinforcement learning algorithms. It emphasizes a simple developing experience with a straightforward codebase and a modern analysis tracking tool. In CORL, we isolate methods implementation into separate single files, making performance-relevant details easier to recognize. Additionally, an experiment tracking feature is available to help log metrics, hyperparameters, dependencies, and more to the cloud. Finally, we have ensured the reliability of the implementations by benchmarking commonly employed D4RL datasets providing a transparent source of results that can be reused for robust evaluation tools such as performance profiles, probability of improvement, or expected online performance.  ( 2 min )
    Where Will Players Move Next? Dynamic Graphs and Hierarchical Fusion for Movement Forecasting in Badminton. (arXiv:2211.12217v2 [cs.LG] UPDATED)
    Sports analytics has captured increasing attention since analysis of the various data enables insights for training strategies, player evaluation, etc. In this paper, we focus on predicting what types of returning strokes will be made, and where players will move to based on previous strokes. As this problem has not been addressed to date, movement forecasting can be tackled through sequence-based and graph-based models by formulating as a sequence prediction task. However, existing sequence-based models neglect the effects of interactions between players, and graph-based models still suffer from multifaceted perspectives on the next movement. Moreover, there is no existing work on representing strategic relations among players' shot types and movements. To address these challenges, we first introduce the procedure of the Player Movements (PM) graph to exploit the structural movements of players with strategic relations. Based on the PM graph, we propose a novel Dynamic Graphs and Hierarchical Fusion for Movement Forecasting model (DyMF) with interaction style extractors to capture the mutual interactions of players themselves and between both players within a rally, and dynamic players' tactics across time. In addition, hierarchical fusion modules are designed to incorporate the style influence of both players and rally interactions. Extensive experiments show that our model empirically outperforms both sequence- and graph-based methods and demonstrate the practical usage of movement forecasting.
    Second-order optimization with lazy Hessians. (arXiv:2212.00781v3 [math.OC] UPDATED)
    We analyze Newton's method with lazy Hessian updates for solving general possibly non-convex optimization problems. We propose to reuse a previously seen Hessian for several iterations while computing new gradients at each step of the method. This significantly reduces the overall arithmetical complexity of second-order optimization schemes. By using the cubic regularization technique, we establish fast global convergence of our method to a second-order stationary point, while the Hessian does not need to be updated each iteration. For convex problems, we justify global and local superlinear rates for lazy Newton steps with quadratic regularization, which is easier to compute. The optimal frequency for updating the Hessian is once every $d$ iterations, where $d$ is the dimension of the problem. This provably improves the total arithmetical complexity of second-order algorithms by a factor $\sqrt{d}$.
    Mnemosyne: Learning to Train Transformers with Transformers. (arXiv:2302.01128v2 [cs.LG] UPDATED)
    Training complex machine learning (ML) architectures requires a compute and time consuming process of selecting the right optimizer and tuning its hyper-parameters. A new paradigm of learning optimizers from data has emerged as a better alternative to hand-designed ML optimizers. We propose Mnemosyne optimizer, that uses Performers: implicit low-rank attention Transformers. It can learn to train entire neural network architectures including other Transformers without any task-specific optimizer tuning. We show that Mnemosyne: (a) generalizes better than popular LSTM optimizer, (b) in particular can successfully train Vision Transformers (ViTs) while meta--trained on standard MLPs and (c) can initialize optimizers for faster convergence in Robotics applications. We believe that these results open the possibility of using Transformers to build foundational optimization models that can address the challenges of regular Transformer training. We complement our results with an extensive theoretical analysis of the compact associative memory used by Mnemosyne.
    Self-Supervised Training with Autoencoders for Visual Anomaly Detection. (arXiv:2206.11723v3 [cs.CV] UPDATED)
    Deep convolutional autoencoders provide an effective tool for learning non-linear dimensionality reduction in an unsupervised way. Recently, they have been used for the task of anomaly detection in the visual domain. By optimising for the reconstruction error using anomaly-free examples, the common belief is that a corresponding network should fail to accurately reconstruct anomalous regions in the application phase. This goal is typically addressed by controlling the capacity of the network by either reducing the size of the bottleneck layer or enforcing sparsity constraints on its activations. However, neither of these techniques does explicitly penalize reconstruction of anomalous signals often resulting in poor detection. We tackle this problem by adapting a self-supervised learning regime, which allows to use discriminative information during training focusing on the data manifold by means of a modified reconstruction error. This regularizes the model to produce locally consistent reconstructions, while replacing irregularities by acting as a filter for anomalous patterns. In contrast to related approaches, inference with our method is very efficient during training and prediction processing the entire input image in one single step. Our experiments on the MVTec AD dataset demonstrate high recognition and localization performance of the proposed method. On the texture-subset, in particular, our approach consistently outperforms a bunch of recent anomaly detection methods by a big margin.  ( 2 min )
    Pareto Manifold Learning: Tackling multiple tasks via ensembles of single-task models. (arXiv:2210.09759v2 [cs.LG] UPDATED)
    In Multi-Task Learning (MTL), tasks may compete and limit the performance achieved on each other, rather than guiding the optimization to a solution, superior to all its single-task trained counterparts. Since there is often not a unique solution optimal for all tasks, practitioners have to balance tradeoffs between tasks' performance, and resort to optimality in the Pareto sense. Most MTL methodologies either completely neglect this aspect, and instead of aiming at learning a Pareto Front, produce one solution predefined by their optimization schemes, or produce diverse but discrete solutions. Recent approaches parameterize the Pareto Front via neural networks, leading to complex mappings from tradeoff to objective space. In this paper, we conjecture that the Pareto Front admits a linear parameterization in parameter space, which leads us to propose \textit{Pareto Manifold Learning}, an ensembling method in weight space. Our approach produces a continuous Pareto Front in a single training run, that allows to modulate the performance on each task during inference. Experiments on multi-task learning benchmarks, ranging from image classification to tabular datasets and scene understanding, show that \textit{Pareto Manifold Learning} outperforms state-of-the-art single-point algorithms, while learning a better Pareto parameterization than multi-point baselines.  ( 2 min )
    Learning under Data Drift with Time-Varying Importance Weights. (arXiv:2210.01422v3 [cs.LG] UPDATED)
    Real-world deployment of machine learning models is challenging because data evolves over time. While no model can work when data evolves in an arbitrary fashion, if there is some pattern to these changes, we might be able to design methods to address it. This paper addresses situations when data evolves gradually. We introduce a time-varying propensity score that can detect gradual shifts in the distribution of data which allows us to selectively sample past data to update the model -- not just similar data from the past like that of a standard propensity score but also data that evolved in a similar fashion in the past. The time-varying propensity score is quite general: we demonstrate different ways of implementing it and evaluate it on a variety of problems ranging from supervised learning (e.g., image classification problems) where data undergoes a sequence of gradual shifts, to reinforcement learning tasks (e.g., robotic manipulation and continuous control) where data shifts as the policy or the task changes.  ( 2 min )
    BeGin: Extensive Benchmark Scenarios and An Easy-to-use Framework for Graph Continual Learning. (arXiv:2211.14568v2 [cs.LG] UPDATED)
    Continual Learning (CL) is the process of learning ceaselessly a sequence of tasks. Most existing CL methods deal with independent data (e.g., images and text) for which many benchmark frameworks and results under standard experimental settings are available. However, CL methods for graph data (graph CL) are surprisingly underexplored because of (a) the lack of standard experimental settings, especially regarding how to deal with the dependency between instances, (b) the lack of benchmark datasets and scenarios, and (c) high complexity in implementation and evaluation due to the dependency. In this paper, regarding (a), we define four standard incremental settings (task-, class-, domain-, and time-incremental) for graph data, which are naturally applied to many node-, link-, and graph-level problems. Regarding (b), we provide 25 benchmark scenarios based on 15 real-world graphs. Regarding (c), we develop BeGin, an easy and fool-proof framework for graph CL. BeGin is easily extended since it is modularized with reusable modules for data processing, algorithm design, and evaluation. Especially, the evaluation module is completely separated from user code to eliminate potential mistakes. Using all the above, we report extensive benchmark results of 10 graph CL methods. Compared to the latest benchmark for graph CL, using BeGin, we cover 3x more combinations of incremental settings and levels of problems. All assets for the benchmark framework are available at https://github.com/ShinhwanKang/BeGin.
    MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning. (arXiv:2210.04183v3 [cs.CV] UPDATED)
    Multimodal representation learning has shown promising improvements on various vision-language tasks. Most existing methods excel at building global-level alignment between vision and language while lacking effective fine-grained image-text interaction. In this paper, we propose a jointly masked multimodal modeling method to learn fine-grained multimodal representations. Our method performs joint masking on image-text input and integrates both implicit and explicit targets for the masked signals to recover. The implicit target provides a unified and debiased objective for vision and language, where the model predicts latent multimodal representations of the unmasked input. The explicit target further enriches the multimodal representations by recovering high-level and semantically meaningful information: momentum visual features of image patches and concepts of word tokens. Through such a masked modeling process, our model not only learns fine-grained multimodal interaction, but also avoids the semantic gap between high-level representations and low- or mid-level prediction targets (e.g. image pixels), thus producing semantically rich multimodal representations that perform well on both zero-shot and fine-tuned settings. Our pre-trained model (named MAMO) achieves state-of-the-art performance on various downstream vision-language tasks, including image-text retrieval, visual question answering, visual reasoning, and weakly-supervised visual grounding.
    Visualizing Deep Neural Networks with Topographic Activation Maps. (arXiv:2204.03528v2 [cs.LG] UPDATED)
    Machine Learning with Deep Neural Networks (DNNs) has become a successful tool in solving tasks across various fields of application. However, the complexity of DNNs makes it difficult to understand how they solve their learned task. To improve the explainability of DNNs, we adapt methods from neuroscience that analyze complex and opaque systems. Here, we draw inspiration from how neuroscience uses topographic maps to visualize brain activity. To also visualize activations of neurons in DNNs as topographic maps, we research techniques to layout the neurons in a two-dimensional space such that neurons of similar activity are in the vicinity of each other. In this work, we introduce and compare methods to obtain a topographic layout of neurons in a DNN layer. Moreover, we demonstrate how to use topographic activation maps to identify errors or encoded biases and to visualize training processes. Our novel visualization technique improves the transparency of DNN-based decision-making systems and is interpretable without expert knowledge in Machine Learning.  ( 2 min )
    Using Neural Networks for Novelty-based Test Selection to Accelerate Functional Coverage Closure. (arXiv:2207.00445v3 [cs.SE] UPDATED)
    Novel test selectors used in simulation-based verification have been shown to significantly accelerate coverage closure regardless of the number of coverage holes. This paper presents a configurable and highly-automated framework for novel test selection based on neural networks. Three configurations of this framework are tested with a commercial signal processing unit. All three convincingly outperform random test selection with the largest saving of simulation being 49.37% to reach 99.5% coverage. The computational expense of the configurations is negligible compared to the simulation reduction. We compare the experimental results and discuss important characteristics related to the performance of the configurations.  ( 2 min )
    PromptCast: A New Prompt-based Learning Paradigm for Time Series Forecasting. (arXiv:2210.08964v3 [stat.ME] UPDATED)
    This paper presents a new perspective on time series forecasting. In existing time series forecasting methods, the models take a sequence of numerical values as input and yield numerical values as output. The existing SOTA models are largely based on the Transformer architecture, modified with multiple encoding mechanisms to incorporate the context and semantics around the historical data. Inspired by the successes of pre-trained language foundation models, we pose a question about whether these models can also be adapted to solve time-series forecasting. Thus, we propose a new forecasting paradigm: prompt-based time series forecasting (PromptCast). In this novel task, the numerical input and output are transformed into prompts and the forecasting task is framed in a sentence-to-sentence manner, making it possible to directly apply language models for forecasting purposes. To support and facilitate the research of this task, we also present a large-scale dataset (PISA) that includes three real-world forecasting scenarios. We evaluate different SOTA numerical-based forecasting methods and language generation models. The benchmark results with various forecasting settings demonstrate the proposed PromptCast with language generation models is a promising research direction. Additionally, in comparison to conventional numerical-based forecasting, PromptCast shows a much better generalization ability under the zero-shot setting.
    White-Box Adversarial Policies in Deep Reinforcement Learning. (arXiv:2209.02167v2 [cs.AI] UPDATED)
    In reinforcement learning (RL), adversarial policies can be developed by training an adversarial agent to minimize a target agent's rewards. Prior work has studied black-box versions of these attacks where the adversary only observes the world state and treats the target agent as any other part of the environment. However, this does not take into account additional structure in the problem. In this work, we take inspiration from the literature on white-box attacks to train more effective adversarial policies. We study white-box adversarial policies and show that having access to a target agent's internal state can be useful for identifying its vulnerabilities. We make two contributions. (1) We introduce white-box adversarial policies where an attacker observes both a target's internal state and the world state at each timestep. We formulate ways of using these policies to attack agents in 2-player games and text-generating language models. (2) We demonstrate that these policies can achieve higher initial and asymptotic performance against a target agent than black-box controls. Code is available at https://github.com/thestephencasper/lm_white_box_attacks
    Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks. (arXiv:2202.00293v4 [stat.ML] UPDATED)
    Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achieve global convergence under gradient descent. The picture can be radically different for narrow networks, which tend to get stuck in badly-generalizing local minima. Here we investigate the cross-over between these two regimes in the high-dimensional setting, and in particular investigate the connection between the so-called mean-field/hydrodynamic regime and the seminal approach of Saad & Solla. Focusing on the case of Gaussian data, we study the interplay between the learning rate, the time scale, and the number of hidden units in the high-dimensional dynamics of stochastic gradient descent (SGD). Our work builds on a deterministic description of SGD in high-dimensions from statistical physics, which we extend and for which we provide rigorous convergence rates.
    Causal Modeling of Policy Interventions From Sequences of Treatments and Outcomes. (arXiv:2209.04142v5 [cs.LG] UPDATED)
    A treatment policy defines when and what treatments are applied to affect some outcome of interest. Data-driven decision-making requires the ability to predict what happens if a policy is changed. Existing methods that predict how the outcome evolves under different scenarios assume that the tentative sequences of future treatments are fixed in advance, while in practice the treatments are determined stochastically by a policy and may depend, for example, on the efficiency of previous treatments. Therefore, the current methods are not applicable if the treatment policy is unknown or a counterfactual analysis is needed. To handle these limitations, we model the treatments and outcomes jointly in continuous time, by combining Gaussian processes and point processes. Our model enables the estimation of a treatment policy from observational sequences of treatments and outcomes, and it can predict the interventional and counterfactual progression of the outcome after an intervention on the treatment policy (in contrast with the causal effect of a single treatment). We show with real-world and semi-synthetic data on blood glucose progression that our method can answer causal queries more accurately than existing alternatives.
    HRFuser: A Multi-resolution Sensor Fusion Architecture for 2D Object Detection. (arXiv:2206.15157v2 [cs.CV] UPDATED)
    Besides standard cameras, autonomous vehicles typically include multiple additional sensors, such as lidars and radars, which help acquire richer information for perceiving the content of the driving scene. While several recent works focus on fusing certain pairs of sensors - such as camera with lidar or radar - by using architectural components specific to the examined setting, a generic and modular sensor fusion architecture is missing from the literature. In this work, we propose HRFuser, a modular architecture for multi-modal 2D object detection. It fuses multiple sensors in a multi-resolution fashion and scales to an arbitrary number of input modalities. The design of HRFuser is based on state-of-the-art high-resolution networks for image-only dense prediction and incorporates a novel multi-window cross-attention block as the means to perform fusion of multiple modalities at multiple resolutions. We demonstrate via extensive experiments on nuScenes and the adverse conditions DENSE datasets that our model effectively leverages complementary features from additional modalities, substantially improving upon camera-only performance and consistently outperforming state-of-the-art 3D and 2D fusion methods evaluated on 2D object detection metrics. The source code is publicly available.
    Analysis and Comparison of Classification Metrics. (arXiv:2209.05355v3 [cs.LG] UPDATED)
    A variety of different performance metrics are commonly used in the machine learning literature for the evaluation of classification systems. Some of the most common ones for measuring quality of hard decisions are standard and balanced accuracy, standard and balanced error rate, F-beta score, and Matthews correlation coefficient (MCC). In this document, we review the definition of these and other metrics and compare them with the expected cost (EC), a metric introduced in every statistical learning course but rarely used in the machine learning literature. We show that both the standard and balanced error rates are special cases of the EC. Further, we show its relation with F-score and MCC and argue that EC is superior to these traditional metrics, being more elegant, general, and intuitive, as well as being based on basic principles from statistics. The metrics above measure the quality of hard decisions. Yet, most modern classification systems output continuous scores for the classes which we may want to evaluate directly. Metrics for measuring the quality of system scores include the area under the ROC curve, equal error rate, cross-entropy, Brier score, and Bayes EC or Bayes risk, among others. The last three metrics are special cases of a family of metrics given by the expected value of proper scoring rules (PSRs). We review the theory behind these metrics and argue that they are the most principled way to measure the quality of the posterior probabilities produced by a system. Finally, we show how to use these metrics to compute the system's calibration loss and compare this metric with the standard expected calibration error (ECE), arguing that calibration loss based on PSRs is superior to the ECE for a variety of reasons.
    PLAtE: A Large-scale Dataset for List Page Web Extraction. (arXiv:2205.12386v2 [cs.CL] UPDATED)
    Recently, neural models have been leveraged to significantly improve the performance of information extraction from semi-structured websites. However, a barrier for continued progress is the small number of datasets large enough to train these models. In this work, we introduce the PLAtE (Pages of Lists Attribute Extraction) benchmark dataset as a challenging new web extraction task. PLAtE focuses on shopping data, specifically extractions from product review pages with multiple items encompassing the tasks of: (1) finding product-list segmentation boundaries and (2) extracting attributes for each product. PLAtE is composed of 52, 898 items collected from 6, 694 pages and 156, 014 attributes, making it the first largescale list page web extraction dataset. We use a multi-stage approach to collect and annotate the dataset and adapt three state-of-the-art web extraction models to the two tasks comparing their strengths and weaknesses both quantitatively and qualitatively.
    Cognitive Ledger Project: Towards Building Personal Digital Twins Through Cognitive Blockchain. (arXiv:2201.08163v2 [cs.AI] UPDATED)
    The Cognitive Ledger Project is an effort to develop a modular system for turning users' personal data into structured information and machine learning models based on a blockchain-based infrastructure. In this work-in-progress paper, we propose a cognitive architecture for cognitive digital twins. The suggested design embraces a cognitive blockchain (Cognitive ledger) at its core. The architecture includes several modules that turn users' activities in the digital environment into reusable knowledge objects and artificial intelligence that one day can work together to form the cognitive digital twin of users.
    Curricular Subgoals for Inverse Reinforcement Learning. (arXiv:2306.08232v1 [cs.LG])
    Inverse Reinforcement Learning (IRL) aims to reconstruct the reward function from expert demonstrations to facilitate policy learning, and has demonstrated its remarkable success in imitation learning. To promote expert-like behavior, existing IRL methods mainly focus on learning global reward functions to minimize the trajectory difference between the imitator and the expert. However, these global designs are still limited by the redundant noise and error propagation problems, leading to the unsuitable reward assignment and thus downgrading the agent capability in complex multi-stage tasks. In this paper, we propose a novel Curricular Subgoal-based Inverse Reinforcement Learning (CSIRL) framework, that explicitly disentangles one task with several local subgoals to guide agent imitation. Specifically, CSIRL firstly introduces decision uncertainty of the trained agent over expert trajectories to dynamically select subgoals, which directly determines the exploration boundary of different task stages. To further acquire local reward functions for each stage, we customize a meta-imitation objective based on these curricular subgoals to train an intrinsic reward generator. Experiments on the D4RL and autonomous driving benchmarks demonstrate that the proposed methods yields results superior to the state-of-the-art counterparts, as well as better interpretability. Our code is available at https://github.com/Plankson/CSIRL.
    ArtWhisperer: A Dataset for Characterizing Human-AI Interactions in Artistic Creations. (arXiv:2306.08141v1 [cs.AI])
    As generative AI becomes more prevalent, it is important to study how human users interact with such models. In this work, we investigate how people use text-to-image models to generate desired target images. To study this interaction, we created ArtWhisperer, an online game where users are given a target image and are tasked with iteratively finding a prompt that creates a similar-looking image as the target. Through this game, we recorded over 50,000 human-AI interactions; each interaction corresponds to one text prompt created by a user and the corresponding generated image. The majority of these are repeated interactions where a user iterates to find the best prompt for their target image, making this a unique sequential dataset for studying human-AI collaborations. In an initial analysis of this dataset, we identify several characteristics of prompt interactions and user strategies. People submit diverse prompts and are able to discover a variety of text descriptions that generate similar images. Interestingly, prompt diversity does not decrease as users find better prompts. We further propose to a new metric the study the steerability of AI using our dataset. We define steerability as the expected number of interactions required to adequately complete a task. We estimate this value by fitting a Markov chain for each target task and calculating the expected time to reach an adequate score in the Markov chain. We quantify and compare AI steerability across different types of target images and two different models, finding that images of cities and natural world images are more steerable than artistic and fantasy images. These findings provide insights into human-AI interaction behavior, present a concrete method of assessing AI steerability, and demonstrate the general utility of the ArtWhisperer dataset.
    Volterra Neural Networks (VNNs). (arXiv:1910.09616v5 [cs.CV] UPDATED)
    The importance of inference in Machine Learning (ML) has led to an explosive number of different proposals in ML, and particularly in Deep Learning. In an attempt to reduce the complexity of Convolutional Neural Networks, we propose a Volterra filter-inspired Network architecture. This architecture introduces controlled non-linearities in the form of interactions between the delayed input samples of data. We propose a cascaded implementation of Volterra Filtering so as to significantly reduce the number of parameters required to carry out the same classification task as that of a conventional Neural Network. We demonstrate an efficient parallel implementation of this Volterra Neural Network (VNN), along with its remarkable performance while retaining a relatively simpler and potentially more tractable structure. Furthermore, we show a rather sophisticated adaptation of this network to nonlinearly fuse the RGB (spatial) information and the Optical Flow (temporal) information of a video sequence for action recognition. The proposed approach is evaluated on UCF-101 and HMDB-51 datasets for action recognition, and is shown to outperform state of the art CNN approaches.
    h2oGPT: Democratizing Large Language Models. (arXiv:2306.08161v1 [cs.CL])
    Foundation Large Language Models (LLMs) such as GPT-4 represent a revolution in AI due to their real-world applications though natural language processing. However, they also pose many significant risks such as the presence of biased, private, or harmful text, and the unauthorized inclusion of copyrighted material. We introduce h2oGPT, a suite of open-source code repositories for the creation and use of Large Language Models (LLMs) based on Generative Pretrained Transformers (GPTs). The goal of this project is to create the world's best truly open-source alternative to closed-source GPTs. In collaboration with and as part of the incredible and unstoppable open-source community, we open-source several fine-tuned h2oGPT models from 7 to 40 Billion parameters, ready for commercial use under fully permissive Apache 2.0 licenses. Included in our release is 100% private document search using natural language. Open-source language models help boost AI development and make it more accessible and trustworthy. They lower entry hurdles, allowing people and groups to tailor these models to their needs. This openness increases innovation, transparency, and fairness. An open-source strategy is needed to share AI benefits fairly, and H2O.ai will continue to democratize AI and LLMs.
    Leveraging dendritic properties to advance machine learning and neuro-inspired computing. (arXiv:2306.08007v1 [cs.NE])
    The brain is a remarkably capable and efficient system. It can process and store huge amounts of noisy and unstructured information using minimal energy. In contrast, current artificial intelligence (AI) systems require vast resources for training while still struggling to compete in tasks that are trivial for biological agents. Thus, brain-inspired engineering has emerged as a promising new avenue for designing sustainable, next-generation AI systems. Here, we describe how dendritic mechanisms of biological neurons have inspired innovative solutions for significant AI problems, including credit assignment in multilayer networks, catastrophic forgetting, and high energy consumption. These findings provide exciting alternatives to existing architectures, showing how dendritic research can pave the way for building more powerful and energy-efficient artificial learning systems.
    Pruning the Way to Reliable Policies: A Multi-Objective Deep Q-Learning Approach to Critical Care. (arXiv:2306.08044v1 [cs.LG])
    Most medical treatment decisions are sequential in nature. Hence, there is substantial hope that reinforcement learning may make it possible to formulate precise data-driven treatment plans. However, a key challenge for most applications in this field is the sparse nature of primarily mortality-based reward functions, leading to decreased stability of offline estimates. In this work, we introduce a deep Q-learning approach able to obtain more reliable critical care policies. This method integrates relevant but noisy intermediate biomarker signals into the reward specification, without compromising the optimization of the main outcome of interest (e.g. patient survival). We achieve this by first pruning the action set based on all available rewards, and second training a final model based on the sparse main reward but with a restricted action set. By disentangling accurate and approximated rewards through action pruning, potential distortions of the main objective are minimized, all while enabling the extraction of valuable information from intermediate signals that can guide the learning process. We evaluate our method in both off-policy and offline settings using simulated environments and real health records of patients in intensive care units. Our empirical results indicate that pruning significantly reduces the size of the action space while staying mostly consistent with the actions taken by physicians, outperforming the current state-of-the-art offline reinforcement learning method conservative Q-learning. Our work is a step towards developing reliable policies by effectively harnessing the wealth of available information in data-intensive critical care environments.
    Feature Engineering-Based Detection of Buffer Overflow Vulnerability in Source Code Using Neural Networks. (arXiv:2306.07981v1 [cs.CR])
    One of the most significant challenges in the field of software code auditing is the presence of vulnerabilities in software source code. Every year, more and more software flaws are discovered, either internally in proprietary code or publicly disclosed. These flaws are highly likely to be exploited and can lead to system compromise, data leakage, or denial of service. To create a large-scale machine learning system for function level vulnerability identification, we utilized a sizable dataset of C and C++ open-source code containing millions of functions with potential buffer overflow exploits. We have developed an efficient and scalable vulnerability detection method based on neural network models that learn features extracted from the source codes. The source code is first converted into an intermediate representation to remove unnecessary components and shorten dependencies. We maintain the semantic and syntactic information using state of the art word embedding algorithms such as GloVe and fastText. The embedded vectors are subsequently fed into neural networks such as LSTM, BiLSTM, LSTM Autoencoder, word2vec, BERT, and GPT2 to classify the possible vulnerabilities. We maintain the semantic and syntactic information using state of the art word embedding algorithms such as GloVe and fastText. The embedded vectors are subsequently fed into neural networks such as LSTM, BiLSTM, LSTM Autoencoder, word2vec, BERT, and GPT2 to classify the possible vulnerabilities. Furthermore, we have proposed a neural network model that can overcome issues associated with traditional neural networks. We have used evaluation metrics such as F1 score, precision, recall, accuracy, and total execution time to measure the performance. We have conducted a comparative analysis between results derived from features containing a minimal text representation and semantic and syntactic information.
    Realising Synthetic Active Inference Agents, Part I: Epistemic Objectives and Graphical Specification Language. (arXiv:2306.08014v1 [cs.AI])
    The Free Energy Principle (FEP) is a theoretical framework for describing how (intelligent) systems self-organise into coherent, stable structures by minimising a free energy functional. Active Inference (AIF) is a corollary of the FEP that specifically details how systems that are able to plan for the future (agents) function by minimising particular free energy functionals that incorporate information seeking components. This paper is the first in a series of two where we derive a synthetic version of AIF on free form factor graphs. The present paper focuses on deriving a local version of the free energy functionals used for AIF. This enables us to construct a version of AIF which applies to arbitrary graphical models and interfaces with prior work on message passing algorithms. The resulting messages are derived in our companion paper. We also identify a gap in the graphical notation used for factor graphs. While factor graphs are great at expressing a generative model, they have so far been unable to specify the full optimisation problem including constraints. To solve this problem we develop Constrained Forney-style Factor Graph (CFFG) notation which permits a fully graphical description of variational inference objectives. We then proceed to show how CFFG's can be used to reconstruct prior algorithms for AIF as well as derive new ones. The latter is demonstrated by deriving an algorithm that permits direct policy inference for AIF agents, circumventing a long standing scaling issue that has so far hindered the application of AIF in industrial settings. We demonstrate our algorithm on the classic T-maze task and show that it reproduces the information seeking behaviour that is a hallmark feature of AIF.
    Leveraging Machine Learning for Multichain DeFi Fraud Detection. (arXiv:2306.07972v1 [q-fin.GN])
    Since the inception of permissionless blockchains with Bitcoin in 2008, it became apparent that their most well-suited use case is related to making the financial system and its advantages available to everyone seamlessly without depending on any trusted intermediaries. Smart contracts across chains provide an ecosystem of decentralized finance (DeFi), where users can interact with lending pools, Automated Market Maker (AMM) exchanges, stablecoins, derivatives, etc. with a cumulative locked value which had exceeded 160B USD. While DeFi comes with high rewards, it also carries plenty of risks. Many financial crimes have occurred over the years making the early detection of malicious activity an issue of high priority. The proposed framework introduces an effective method for extracting a set of features from different chains, including the largest one, Ethereum and it is evaluated over an extensive dataset we gathered with the transactions of the most widely used DeFi protocols (23 in total, including Aave, Compound, Curve, Lido, and Yearn) based on a novel dataset in collaboration with Covalent. Different Machine Learning methods were employed, such as XGBoost and a Neural Network for identifying fraud accounts detection interacting with DeFi and we demonstrate that the introduction of novel DeFi-related features, significantly improves the evaluation results, where Accuracy, Precision, Recall, F1-score and F2-score where utilized.
    Using Interventions to Improve Out-of-Distribution Generalization of Text-Matching Recommendation Systems. (arXiv:2210.10636v2 [cs.IR] UPDATED)
    Given a user's input text, text-matching recommender systems output relevant items by comparing the input text to available items' description, such as product-to-product recommendation on e-commerce platforms. As users' interests and item inventory are expected to change, it is important for a text-matching system to generalize to data shifts, a task known as out-of-distribution (OOD) generalization. However, we find that the popular approach of fine-tuning a large, base language model on paired item relevance data (e.g., user clicks) can be counter-productive for OOD generalization. For a product recommendation task, fine-tuning obtains worse accuracy than the base model when recommending items in a new category or for a future time period. To explain this generalization failure, we consider an intervention-based importance metric, which shows that a fine-tuned model captures spurious correlations and fails to learn the causal features that determine the relevance between any two text inputs. Moreover, standard methods for causal regularization do not apply in this setting, because unlike in images, there exist no universally spurious features in a text-matching task (the same token may be spurious or causal depending on the text it is being matched to). For OOD generalization on text inputs, therefore, we highlight a different goal: avoiding high importance scores for certain features. We do so using an intervention-based regularizer that constraints the causal effect of any token on the model's relevance score to be similar to the base model. Results on Amazon product and 3 question recommendation datasets show that our proposed regularizer improves generalization for both in-distribution and OOD evaluation, especially in difficult scenarios when the base model is not accurate.
    Reducing SO(3) Convolutions to SO(2) for Efficient Equivariant GNNs. (arXiv:2302.03655v2 [cs.LG] UPDATED)
    Graph neural networks that model 3D data, such as point clouds or atoms, are typically desired to be $SO(3)$ equivariant, i.e., equivariant to 3D rotations. Unfortunately equivariant convolutions, which are a fundamental operation for equivariant networks, increase significantly in computational complexity as higher-order tensors are used. In this paper, we address this issue by reducing the $SO(3)$ convolutions or tensor products to mathematically equivalent convolutions in $SO(2)$ . This is accomplished by aligning the node embeddings' primary axis with the edge vectors, which sparsifies the tensor product and reduces the computational complexity from $O(L^6)$ to $O(L^3)$, where $L$ is the degree of the representation. We demonstrate the potential implications of this improvement by proposing the Equivariant Spherical Channel Network (eSCN), a graph neural network utilizing our novel approach to equivariant convolutions, which achieves state-of-the-art results on the large-scale OC-20 and OC-22 datasets.
    PrivaScissors: Enhance the Privacy of Collaborative Inference through the Lens of Mutual Information. (arXiv:2306.07973v1 [cs.CR])
    Edge-cloud collaborative inference empowers resource-limited IoT devices to support deep learning applications without disclosing their raw data to the cloud server, thus preserving privacy. Nevertheless, prior research has shown that collaborative inference still results in the exposure of data and predictions from edge devices. To enhance the privacy of collaborative inference, we introduce a defense strategy called PrivaScissors, which is designed to reduce the mutual information between a model's intermediate outcomes and the device's data and predictions. We evaluate PrivaScissors's performance on several datasets in the context of diverse attacks and offer a theoretical robustness guarantee.
    iSAGE: An Incremental Version of SAGE for Online Explanation on Data Streams. (arXiv:2303.01181v2 [cs.LG] UPDATED)
    Existing methods for explainable artificial intelligence (XAI), including popular feature importance measures such as SAGE, are mostly restricted to the batch learning scenario. However, machine learning is often applied in dynamic environments, where data arrives continuously and learning must be done in an online manner. Therefore, we propose iSAGE, a time- and memory-efficient incrementalization of SAGE, which is able to react to changes in the model as well as to drift in the data-generating process. We further provide efficient feature removal methods that break (interventional) and retain (observational) feature dependencies. Moreover, we formally analyze our explanation method to show that iSAGE adheres to similar theoretical properties as SAGE. Finally, we evaluate our approach in a thorough experimental analysis based on well-established data sets and data streams with concept drift.
    SPADE4: Sparsity and Delay Embedding based Forecasting of Epidemics. (arXiv:2211.08277v2 [cs.LG] UPDATED)
    Predicting the evolution of diseases is challenging, especially when the data availability is scarce and incomplete. The most popular tools for modelling and predicting infectious disease epidemics are compartmental models. They stratify the population into compartments according to health status and model the dynamics of these compartments using dynamical systems. However, these predefined systems may not capture the true dynamics of the epidemic due to the complexity of the disease transmission and human interactions. In order to overcome this drawback, we propose Sparsity and Delay Embedding based Forecasting (SPADE4) for predicting epidemics. SPADE4 predicts the future trajectory of an observable variable without the knowledge of the other variables or the underlying system. We use random features model with sparse regression to handle the data scarcity issue and employ Takens' delay embedding theorem to capture the nature of the underlying system from the observed variable. We show that our approach outperforms compartmental models when applied to both simulated and real data.
    Large language models predict human sensory judgments across six modalities. (arXiv:2302.01308v2 [cs.CL] UPDATED)
    Determining the extent to which the perceptual world can be recovered from language is a longstanding problem in philosophy and cognitive science. We show that state-of-the-art large language models can unlock new insights into this problem by providing a lower bound on the amount of perceptual information that can be extracted from language. Specifically, we elicit pairwise similarity judgments from GPT models across six psychophysical datasets. We show that the judgments are significantly correlated with human data across all domains, recovering well-known representations like the color wheel and pitch spiral. Surprisingly, we find that a model (GPT-4) co-trained on vision and language does not necessarily lead to improvements specific to the visual modality. To study the influence of specific languages on perception, we also apply the models to a multilingual color-naming task. We find that GPT-4 replicates cross-linguistic variation in English and Russian illuminating the interaction of language and perception.
    Localised Adaptive Spatial-Temporal Graph Neural Network. (arXiv:2306.06930v2 [cs.LG] UPDATED)
    Spatial-temporal graph models are prevailing for abstracting and modelling spatial and temporal dependencies. In this work, we ask the following question: whether and to what extent can we localise spatial-temporal graph models? We limit our scope to adaptive spatial-temporal graph neural networks (ASTGNNs), the state-of-the-art model architecture. Our approach to localisation involves sparsifying the spatial graph adjacency matrices. To this end, we propose Adaptive Graph Sparsification (AGS), a graph sparsification algorithm which successfully enables the localisation of ASTGNNs to an extreme extent (fully localisation). We apply AGS to two distinct ASTGNN architectures and nine spatial-temporal datasets. Intriguingly, we observe that spatial graphs in ASTGNNs can be sparsified by over 99.5\% without any decline in test accuracy. Furthermore, even when ASTGNNs are fully localised, becoming graph-less and purely temporal, we record no drop in accuracy for the majority of tested datasets, with only minor accuracy deterioration observed in the remaining datasets. However, when the partially or fully localised ASTGNNs are reinitialised and retrained on the same data, there is a considerable and consistent drop in accuracy. Based on these observations, we reckon that \textit{(i)} in the tested data, the information provided by the spatial dependencies is primarily included in the information provided by the temporal dependencies and, thus, can be essentially ignored for inference; and \textit{(ii)} although the spatial dependencies provide redundant information, it is vital for the effective training of ASTGNNs and thus cannot be ignored during training. Furthermore, the localisation of ASTGNNs holds the potential to reduce the heavy computation overhead required on large-scale spatial-temporal data and further enable the distributed deployment of ASTGNNs.
    Directional Privacy for Deep Learning. (arXiv:2211.04686v2 [cs.LG] UPDATED)
    Differentially Private Stochastic Gradient Descent (DP-SGD) is a key method for applying privacy in the training of deep learning models. This applies isotropic Gaussian noise to gradients during training, which can perturb these gradients in any direction, damaging utility. Metric DP, however, can provide alternative mechanisms based on arbitrary metrics that might be more suitable for preserving utility. In this paper, we apply \textit{directional privacy}, via a mechanism based on the von Mises-Fisher (VMF) distribution, to perturb gradients in terms of \textit{angular distance} so that gradient direction is broadly preserved. We show that this provides both $\epsilon$-DP and $\epsilon d$-privacy for deep learning training, rather than the $(\epsilon, \delta)$-privacy of the Gaussian mechanism; we observe that the $\epsilon d$-privacy guarantee does not require a $\delta>0$ term but degrades smoothly according to the dissimilarity of the input gradients. As $\epsilon$s between these different frameworks cannot be directly compared, we examine empirical privacy calibration mechanisms that go beyond previous work on empirically calibrating privacy within standard DP frameworks using membership inference attacks (MIA); we show that a combination of enhanced MIA and reconstruction attacks provides a suitable method for privacy calibration. Experiments on key datasets then indicate that the VMF mechanism can outperform the Gaussian in the utility-privacy trade-off. In particular, our experiments provide a direct comparison of privacy between the two approaches in terms of their ability to defend against reconstruction and membership inference.
    Self-supervised Equality Embedded Deep Lagrange Dual for Approximate Constrained Optimization. (arXiv:2306.06674v2 [math.OC] UPDATED)
    Conventional solvers are often computationally expensive for constrained optimization, particularly in large-scale and time-critical problems. While this leads to a growing interest in using neural networks (NNs) as fast optimal solution approximators, incorporating the constraints with NNs is challenging. In this regard, we propose deep Lagrange dual with equality embedding (DeepLDE), a framework that learns to find an optimal solution without using labels. To ensure feasible solutions, we embed equality constraints into the NNs and train the NNs using the primal-dual method to impose inequality constraints. Furthermore, we prove the convergence of DeepLDE and show that the primal-dual learning method alone cannot ensure equality constraints without the help of equality embedding. Simulation results on convex, non-convex, and AC optimal power flow (AC-OPF) problems show that the proposed DeepLDE achieves the smallest optimality gap among all the NN-based approaches while always ensuring feasible solutions. Furthermore, the computation time of the proposed method is about 5 to 250 times faster than DC3 and the conventional solvers in solving constrained convex, non-convex optimization, and/or AC-OPF.
    InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning. (arXiv:2305.06500v2 [cs.CV] UPDATED)
    Large-scale pre-training and instruction tuning have been successful at creating general-purpose language models with broad competence. However, building general-purpose vision-language models is challenging due to the rich input distributions and task diversity resulting from the additional visual input. Although vision-language pretraining has been widely studied, vision-language instruction tuning remains under-explored. In this paper, we conduct a systematic and comprehensive study on vision-language instruction tuning based on the pretrained BLIP-2 models. We gather 26 publicly available datasets, covering a wide variety of tasks and capabilities, and transform them into instruction tuning format. Additionally, we introduce an instruction-aware Query Transformer, which extracts informative features tailored to the given instruction. Trained on 13 held-in datasets, InstructBLIP attains state-of-the-art zero-shot performance across all 13 held-out datasets, substantially outperforming BLIP-2 and larger Flamingo models. Our models also lead to state-of-the-art performance when finetuned on individual downstream tasks (e.g., 90.7% accuracy on ScienceQA questions with image contexts). Furthermore, we qualitatively demonstrate the advantages of InstructBLIP over concurrent multimodal models. All InstructBLIP models are open-sourced at https://github.com/salesforce/LAVIS/tree/main/projects/instructblip.
    MixupE: Understanding and Improving Mixup from Directional Derivative Perspective. (arXiv:2212.13381v3 [cs.LG] UPDATED)
    Mixup is a popular data augmentation technique for training deep neural networks where additional samples are generated by linearly interpolating pairs of inputs and their labels. This technique is known to improve the generalization performance in many learning paradigms and applications. In this work, we first analyze Mixup and show that it implicitly regularizes infinitely many directional derivatives of all orders. Based on this new insight, we propose an improved version of Mixup, theoretically justified to deliver better generalization performance than the vanilla Mixup. To demonstrate the effectiveness of the proposed method, we conduct experiments across various domains such as images, tabular data, speech, and graphs. Our results show that the proposed method improves Mixup across multiple datasets using a variety of architectures, for instance, exhibiting an improvement over Mixup by 0.8% in ImageNet top-1 accuracy.
    Tool Learning with Foundation Models. (arXiv:2304.08354v2 [cs.CL] UPDATED)
    Humans possess an extraordinary ability to create and utilize tools, allowing them to overcome physical limitations and explore new frontiers. With the advent of foundation models, AI systems have the potential to be equally adept in tool use as humans. This paradigm, i.e., tool learning with foundation models, combines the strengths of specialized tools and foundation models to achieve enhanced accuracy, efficiency, and automation in problem-solving. Despite its immense potential, there is still a lack of a comprehensive understanding of key challenges, opportunities, and future endeavors in this field. To this end, we present a systematic investigation of tool learning in this paper. We first introduce the background of tool learning, including its cognitive origins, the paradigm shift of foundation models, and the complementary roles of tools and models. Then we recapitulate existing tool learning research into tool-augmented and tool-oriented learning. We formulate a general tool learning framework: starting from understanding the user instruction, models should learn to decompose a complex task into several subtasks, dynamically adjust their plan through reasoning, and effectively conquer each sub-task by selecting appropriate tools. We also discuss how to train models for improved tool-use capabilities and facilitate the generalization in tool learning. Considering the lack of a systematic tool learning evaluation in prior works, we experiment with 18 representative tools and show the potential of current foundation models in skillfully utilizing tools. Finally, we discuss several open problems that require further investigation for tool learning. Overall, we hope this paper could inspire future research in integrating tools with foundation models.
    Unprocessing Seven Years of Algorithmic Fairness. (arXiv:2306.07261v2 [cs.LG] UPDATED)
    Seven years ago, researchers proposed a postprocessing method to equalize the error rates of a model across different demographic groups. The work launched hundreds of papers purporting to improve over the postprocessing baseline. We empirically evaluate these claims through thousands of model evaluations on several tabular datasets. We find that the fairness-accuracy Pareto frontier achieved by postprocessing contains all other methods we were feasibly able to evaluate. In doing so, we address two common methodological errors that have confounded previous observations. One relates to the comparison of methods with different unconstrained base models. The other concerns methods achieving different levels of constraint relaxation. At the heart of our study is a simple idea we call unprocessing that roughly corresponds to the inverse of postprocessing. Unprocessing allows for a direct comparison of methods using different underlying models and levels of relaxation. Interpreting our findings, we recall a widely overlooked theoretical argument, present seven years ago, that accurately predicted what we observe.
    NF4 Isn't Information Theoretically Optimal (and that's Good). (arXiv:2306.06965v2 [cs.LG] UPDATED)
    This note shares some simple calculations and experiments related to absmax-based blockwise quantization, as used in Dettmers et al., 2023. Their proposed NF4 data type is said to be information theoretically optimal for representing normally distributed weights. I show that this can't quite be the case, as the distribution of the values to be quantized depends on the block-size. I attempt to apply these insights to derive an improved code based on minimizing the expected L1 reconstruction error, rather than the quantile based method. This leads to improved performance for larger quantization block sizes, while both codes perform similarly at smaller block sizes.
    Tight Certification of Adversarially Trained Neural Networks via Nonconvex Low-Rank Semidefinite Relaxations. (arXiv:2211.17244v3 [cs.LG] UPDATED)
    Adversarial training is well-known to produce high-quality neural network models that are empirically robust against adversarial perturbations. Nevertheless, once a model has been adversarially trained, one often desires a certification that the model is truly robust against all future attacks. Unfortunately, when faced with adversarially trained models, all existing approaches have significant trouble making certifications that are strong enough to be practically useful. Linear programming (LP) techniques in particular face a "convex relaxation barrier" that prevent them from making high-quality certifications, even after refinement with mixed-integer linear programming (MILP) and branch-and-bound (BnB) techniques. In this paper, we propose a nonconvex certification technique, based on a low-rank restriction of a semidefinite programming (SDP) relaxation. The nonconvex relaxation makes strong certifications comparable to much more expensive SDP methods, while optimizing over dramatically fewer variables comparable to much weaker LP methods. Despite nonconvexity, we show how off-the-shelf local optimization algorithms can be used to achieve and to certify global optimality in polynomial time. Our experiments find that the nonconvex relaxation almost completely closes the gap towards exact certification of adversarially trained models.
    Stable Deep MRI Reconstruction using Generative Priors. (arXiv:2210.13834v3 [eess.IV] UPDATED)
    Data-driven approaches recently achieved remarkable success in magnetic resonance imaging (MRI) reconstruction, but integration into clinical routine remains challenging due to a lack of generalizability and interpretability. In this paper, we address these challenges in a unified framework based on generative image priors. We propose a novel deep neural network based regularizer which is trained in a generative setting on reference magnitude images only. After training, the regularizer encodes higher-level domain statistics which we demonstrate by synthesizing images without data. Embedding the trained model in a classical variational approach yields high-quality reconstructions irrespective of the sub-sampling pattern. In addition, the model shows stable behavior when confronted with out-of-distribution data in the form of contrast variation. Furthermore, a probabilistic interpretation provides a distribution of reconstructions and hence allows uncertainty quantification. To reconstruct parallel MRI, we propose a fast algorithm to jointly estimate the image and the sensitivity maps. The results demonstrate competitive performance, on par with state-of-the-art end-to-end deep learning methods, while preserving the flexibility with respect to sub-sampling patterns and allowing for uncertainty quantification.
    Improved disentangled speech representations using contrastive learning in factorized hierarchical variational autoencoder. (arXiv:2211.08191v2 [eess.AS] UPDATED)
    Leveraging the fact that speaker identity and content vary on different time scales, \acrlong{fhvae} (\acrshort{fhvae}) uses different latent variables to symbolize these two attributes. Disentanglement of these attributes is carried out by different prior settings of the corresponding latent variables. For the prior of speaker identity variable, \acrshort{fhvae} assumes it is a Gaussian distribution with an utterance-scale varying mean and a fixed variance. By setting a small fixed variance, the training process promotes identity variables within one utterance gathering close to the mean of their prior. However, this constraint is relatively weak, as the mean of the prior changes between utterances. Therefore, we introduce contrastive learning into the \acrshort{fhvae} framework, to make the speaker identity variables gathering when representing the same speaker, while distancing themselves as far as possible from those of other speakers. The model structure has not been changed in this work but only the training process, thus no additional cost is needed during testing. Voice conversion has been chosen as the application in this paper. Latent variable evaluations include speaker verification and identification for the speaker identity variable, and speech recognition for the content variable. Furthermore, assessments of voice conversion performance are on the grounds of fake speech detection experiments. Results show that the proposed method improves both speaker identity and content feature extraction compared to \acrshort{fhvae}, and has better performance than baseline on conversion.
    The Neural Covariance SDE: Shaped Infinite Depth-and-Width Networks at Initialization. (arXiv:2206.02768v3 [stat.ML] UPDATED)
    The logit outputs of a feedforward neural network at initialization are conditionally Gaussian, given a random covariance matrix defined by the penultimate layer. In this work, we study the distribution of this random matrix. Recent work has shown that shaping the activation function as network depth grows large is necessary for this covariance matrix to be non-degenerate. However, the current infinite-width-style understanding of this shaping method is unsatisfactory for large depth: infinite-width analyses ignore the microscopic fluctuations from layer to layer, but these fluctuations accumulate over many layers. To overcome this shortcoming, we study the random covariance matrix in the shaped infinite-depth-and-width limit. We identify the precise scaling of the activation function necessary to arrive at a non-trivial limit, and show that the random covariance matrix is governed by a stochastic differential equation (SDE) that we call the Neural Covariance SDE. Using simulations, we show that the SDE closely matches the distribution of the random covariance matrix of finite networks. Additionally, we recover an if-and-only-if condition for exploding and vanishing norms of large shaped networks based on the activation function.
    On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline. (arXiv:2212.05749v2 [cs.LG] UPDATED)
    In this paper, we examine the effectiveness of pre-training for visuo-motor control tasks. We revisit a simple Learning-from-Scratch (LfS) baseline that incorporates data augmentation and a shallow ConvNet, and find that this baseline is surprisingly competitive with recent approaches (PVR, MVP, R3M) that leverage frozen visual representations trained on large-scale vision datasets -- across a variety of algorithms, task domains, and metrics in simulation and on a real robot. Our results demonstrate that these methods are hindered by a significant domain gap between the pre-training datasets and current benchmarks for visuo-motor control, which is alleviated by finetuning. Based on our findings, we provide recommendations for future research in pre-training for control and hope that our simple yet strong baseline will aid in accurately benchmarking progress in this area.
    Dynamic Interval Restrictions on Action Spaces in Deep Reinforcement Learning for Obstacle Avoidance. (arXiv:2306.08008v1 [cs.LG])
    Deep reinforcement learning algorithms typically act on the same set of actions. However, this is not sufficient for a wide range of real-world applications where different subsets are available at each step. In this thesis, we consider the problem of interval restrictions as they occur in pathfinding with dynamic obstacles. When actions that lead to collisions are avoided, the continuous action space is split into variable parts. Recent research learns with strong assumptions on the number of intervals, is limited to convex subsets, and the available actions are learned from the observations. Therefore, we propose two approaches that are independent of the state of the environment by extending parameterized reinforcement learning and ConstraintNet to handle an arbitrary number of intervals. We demonstrate their performance in an obstacle avoidance task and compare the methods to penalties, projection, replacement, as well as discrete and continuous masking from the literature. The results suggest that discrete masking of action-values is the only effective method when constraints did not emerge during training. When restrictions are learned, the decision between projection, masking, and our ConstraintNet modification seems to depend on the task at hand. We compare the results with varying complexity and give directions for future work.
    Better Generalization with Semantic IDs: A case study in Ranking for Recommendations. (arXiv:2306.08121v1 [cs.IR])
    Training good representations for items is critical in recommender models. Typically, an item is assigned a unique randomly generated ID, and is commonly represented by learning an embedding corresponding to the value of the random ID. Although widely used, this approach have limitations when the number of items are large and items are power-law distributed -- typical characteristics of real-world recommendation systems. This leads to the item cold-start problem, where the model is unable to make reliable inferences for tail and previously unseen items. Removing these ID features and their learned embeddings altogether to combat cold-start issue severely degrades the recommendation quality. Content-based item embeddings are more reliable, but they are expensive to store and use, particularly for users' past item interaction sequence. In this paper, we use Semantic IDs, a compact discrete item representations learned from content embeddings using RQ-VAE that captures hierarchy of concepts in items. We showcase how we use them as a replacement of item IDs in a resource-constrained ranking model used in an industrial-scale video sharing platform. Moreover, we show how Semantic IDs improves the generalization ability of our system, without sacrificing top-level metrics.
    How Robust is your Fair Model? Exploring the Robustness of Diverse Fairness Strategies. (arXiv:2207.04581v3 [cs.LG] UPDATED)
    With the introduction of machine learning in high-stakes decision making, ensuring algorithmic fairness has become an increasingly important problem to solve. In response to this, many mathematical definitions of fairness have been proposed, and a variety of optimisation techniques have been developed, all designed to maximise a defined notion of fairness. However, fair solutions are reliant on the quality of the training data, and can be highly sensitive to noise. Recent studies have shown that robustness (the ability for a model to perform well on unseen data) plays a significant role in the type of strategy that should be used when approaching a new problem and, hence, measuring the robustness of these strategies has become a fundamental problem. In this work, we therefore propose a new criterion to measure the robustness of various fairness optimisation strategies - the robustness ratio. We conduct multiple extensive experiments on five bench mark fairness data sets using three of the most popular fairness strategies with respect to four of the most popular definitions of fairness. Our experiments empirically show that fairness methods that rely on threshold optimisation are very sensitive to noise in all the evaluated data sets, despite mostly outperforming other methods. This is in contrast to the other two methods, which are less fair for low noise scenarios but fairer for high noise ones. To the best of our knowledge, we are the first to quantitatively evaluate the robustness of fairness optimisation strategies. This can potentially can serve as a guideline in choosing the most suitable fairness strategy for various data sets.
    FP8 versus INT8 for efficient deep learning inference. (arXiv:2303.17951v2 [cs.LG] UPDATED)
    Recently, the idea of using FP8 as a number format for neural network training has been floating around the deep learning world. Given that most training is currently conducted with entire networks in FP32, or sometimes FP16 with mixed-precision, the step to having some parts of a network run in FP8 with 8-bit weights is an appealing potential speed-up for the generally costly and time-intensive training procedures in deep learning. A natural question arises regarding what this development means for efficient inference on edge devices. In the efficient inference device world, workloads are frequently executed in INT8. Sometimes going even as low as INT4 when efficiency calls for it. In this whitepaper, we compare the performance for both the FP8 and INT formats for efficient on-device inference. We theoretically show the difference between the INT and FP formats for neural networks and present a plethora of post-training quantization and quantization-aware-training results to show how this theory translates to practice. We also provide a hardware analysis showing that the FP formats are somewhere between 50-180% less efficient in terms of compute in dedicated hardware than the INT format. Based on our research and a read of the research field, we conclude that although the proposed FP8 format could be good for training, the results for inference do not warrant a dedicated implementation of FP8 in favor of INT8 for efficient inference. We show that our results are mostly consistent with previous findings but that important comparisons between the formats have thus far been lacking. Finally, we discuss what happens when FP8-trained networks are converted to INT8 and conclude with a brief discussion on the most efficient way for on-device deployment and an extensive suite of INT8 results for many models.
    Data Augmentation for Seizure Prediction with Generative Diffusion Model. (arXiv:2306.08256v1 [eess.SP])
    Objective: Seizure prediction is of great importance to improve the life of patients. The focal point is to distinguish preictal states from interictal ones. With the development of machine learning, seizure prediction methods have achieved significant progress. However, the severe imbalance problem between preictal and interictal data still poses a great challenge, restricting the performance of classifiers. Data augmentation is an intuitive way to solve this problem. Existing data augmentation methods generate samples by overlapping or recombining data. The distribution of generated samples is limited by original data, because such transformations cannot fully explore the feature space and offer new information. As the epileptic EEG representation varies among seizures, these generated samples cannot provide enough diversity to achieve high performance on a new seizure. As a consequence, we propose a novel data augmentation method with diffusion model called DiffEEG. Methods: Diffusion models are a class of generative models that consist of two processes. Specifically, in the diffusion process, the model adds noise to the input EEG sample step by step and converts the noisy sample into output random noise, exploring the distribution of data by minimizing the loss between the output and the noise added. In the denoised process, the model samples the synthetic data by removing the noise gradually, diffusing the data distribution to outward areas and narrowing the distance between different clusters. Results: We compared DiffEEG with existing methods, and integrated them into three representative classifiers. The experiments indicate that DiffEEG could further improve the performance and shows superiority to existing methods. Conclusion: This paper proposes a novel and effective method to solve the imbalanced problem and demonstrates the effectiveness and generality of our method.  ( 3 min )
    Private Federated Learning Without a Trusted Server: Optimal Algorithms for Convex Losses. (arXiv:2106.09779v8 [cs.LG] UPDATED)
    This paper studies federated learning (FL)--especially cross-silo FL--with data from people who do not trust the server or other silos. In this setting, each silo (e.g. hospital) has data from different people (e.g. patients) and must maintain the privacy of each person's data (e.g. medical record), even if the server or other silos act as adversarial eavesdroppers. This requirement motivates the study of Inter-Silo Record-Level Differential Privacy (ISRL-DP), which requires silo i's communications to satisfy record/item-level differential privacy (DP). ISRL-DP ensures that the data of each person (e.g. patient) in silo i (e.g. hospital i) cannot be leaked. ISRL-DP is different from well-studied privacy notions. Central and user-level DP assume that people trust the server/other silos. On the other end of the spectrum, local DP assumes that people do not trust anyone at all (even their own silo). Sitting between central and local DP, ISRL-DP makes the realistic assumption (in cross-silo FL) that people trust their own silo, but not the server or other silos. In this work, we provide tight (up to logarithms) upper and lower bounds for ISRL-DP FL with convex/strongly convex loss functions and homogeneous (i.i.d.) silo data. Remarkably, we show that similar bounds are attainable for smooth losses with arbitrary heterogeneous silo data distributions, via an accelerated ISRL-DP algorithm. We also provide tight upper and lower bounds for ISRL-DP federated empirical risk minimization, and use acceleration to attain the optimal bounds in fewer rounds of communication than the state-of-the-art. Finally, with a secure "shuffler" to anonymize silo messages (but without a trusted server), our algorithm attains the optimal central DP rates under more practical trust assumptions. Numerical experiments show favorable privacy-accuracy tradeoffs for our algorithm in classification and regression tasks.  ( 3 min )
    Parallel Successive Learning for Dynamic Distributed Model Training over Heterogeneous Wireless Networks. (arXiv:2202.02947v6 [cs.LG] UPDATED)
    Federated learning (FedL) has emerged as a popular technique for distributing model training over a set of wireless devices, via iterative local updates (at devices) and global aggregations (at the server). In this paper, we develop parallel successive learning (PSL), which expands the FedL architecture along three dimensions: (i) Network, allowing decentralized cooperation among the devices via device-to-device (D2D) communications. (ii) Heterogeneity, interpreted at three levels: (ii-a) Learning: PSL considers heterogeneous number of stochastic gradient descent iterations with different mini-batch sizes at the devices; (ii-b) Data: PSL presumes a dynamic environment with data arrival and departure, where the distributions of local datasets evolve over time, captured via a new metric for model/concept drift. (ii-c) Device: PSL considers devices with different computation and communication capabilities. (iii) Proximity, where devices have different distances to each other and the access point. PSL considers the realistic scenario where global aggregations are conducted with idle times in-between them for resource efficiency improvements, and incorporates data dispersion and model dispersion with local model condensation into FedL. Our analysis sheds light on the notion of cold vs. warmed up models, and model inertia in distributed machine learning. We then propose network-aware dynamic model tracking to optimize the model learning vs. resource efficiency tradeoff, which we show is an NP-hard signomial programming problem. We finally solve this problem through proposing a general optimization solver. Our numerical results reveal new findings on the interdependencies between the idle times in-between the global aggregations, model/concept drift, and D2D cooperation configuration.  ( 3 min )
    Topological Experience Replay. (arXiv:2203.15845v2 [cs.LG] UPDATED)
    State-of-the-art deep Q-learning methods update Q-values using state transition tuples sampled from the experience replay buffer. This strategy often uniformly and randomly samples or prioritizes data sampling based on measures such as the temporal difference (TD) error. Such sampling strategies can be inefficient at learning Q-function because a state's Q-value depends on the Q-value of successor states. If the data sampling strategy ignores the precision of the Q-value estimate of the next state, it can lead to useless and often incorrect updates to the Q-values. To mitigate this issue, we organize the agent's experience into a graph that explicitly tracks the dependency between Q-values of states. Each edge in the graph represents a transition between two states by executing a single action. We perform value backups via a breadth-first search starting from that expands vertices in the graph starting from the set of terminal states and successively moving backward. We empirically show that our method is substantially more data-efficient than several baselines on a diverse range of goal-reaching tasks. Notably, the proposed method also outperforms baselines that consume more batches of training experience and operates from high-dimensional observational data such as images.  ( 2 min )
    Distributionally Robust Data Join. (arXiv:2202.05797v2 [cs.LG] UPDATED)
    Suppose we are given two datasets: a labeled dataset and unlabeled dataset which also has additional auxiliary features not present in the first dataset. What is the most principled way to use these datasets together to construct a predictor? The answer should depend upon whether these datasets are generated by the same or different distributions over their mutual feature sets, and how similar the test distribution will be to either of those distributions. In many applications, the two datasets will likely follow different distributions, but both may be close to the test distribution. We introduce the problem of building a predictor which minimizes the maximum loss over all probability distributions over the original features, auxiliary features, and binary labels, whose Wasserstein distance is $r_1$ away from the empirical distribution over the labeled dataset and $r_2$ away from that of the unlabeled dataset. This can be thought of as a generalization of distributionally robust optimization (DRO), which allows for two data sources, one of which is unlabeled and may contain auxiliary features.  ( 2 min )
    Bilinear value networks. (arXiv:2204.13695v2 [cs.AI] UPDATED)
    The dominant framework for off-policy multi-goal reinforcement learning involves estimating goal conditioned Q-value function. When learning to achieve multiple goals, data efficiency is intimately connected with the generalization of the Q-function to new goals. The de-facto paradigm is to approximate Q(s, a, g) using monolithic neural networks. To improve the generalization of the Q-function, we propose a bilinear decomposition that represents the Q-value via a low-rank approximation in the form of a dot product between two vector fields. The first vector field, f(s, a), captures the environment's local dynamics at the state s; whereas the second component, {\phi}(s, g), captures the global relationship between the current state and the goal. We show that our bilinear decomposition scheme substantially improves data efficiency, and has superior transfer to out-of-distribution goals compared to prior methods. Empirical evidence is provided on the simulated Fetch robot task-suite and dexterous manipulation with a Shadow hand.  ( 2 min )
    SC2 Benchmark: Supervised Compression for Split Computing. (arXiv:2203.08875v2 [cs.LG] UPDATED)
    With the increasing demand for deep learning models on mobile devices, splitting neural network computation between the device and a more powerful edge server has become an attractive solution. However, existing split computing approaches often underperform compared to a naive baseline of remote computation on compressed data. Recent studies propose learning compressed representations that contain more relevant information for supervised downstream tasks, showing improved tradeoffs between compressed data size and supervised performance. However, existing evaluation metrics only provide an incomplete picture of split computing. This study introduces supervised compression for split computing (SC2) and proposes new evaluation criteria: minimizing computation on the mobile device, minimizing transmitted data size, and maximizing model accuracy. We conduct a comprehensive benchmark study using 10 baseline methods, three computer vision tasks, and over 180 trained models, and discuss various aspects of SC2. We also release sc2bench, a Python package for future research on SC2. Our proposed metrics and package will help researchers better understand the tradeoffs of supervised compression in split computing.  ( 2 min )
    Imagery Tracking of Sun Activity Using 2D Circular Kernel Time Series Transformation, Entropy Measures and Machine Learning Approaches. (arXiv:2306.08270v1 [astro-ph.SR])
    The sun is highly complex in nature and its observatory imagery features is one of the most important sources of information about the sun activity, space and Earth's weather conditions. The NASA, solar Dynamics Observatory captures approximately 70,000 images of the sun activity in a day and the continuous visual inspection of this solar observatory images is challenging. In this study, we developed a technique of tracking the sun's activity using 2D circular kernel time series transformation, statistical and entropy measures, with machine learning approaches. The technique involves transforming the solar observatory image section into 1-Dimensional time series (1-DTS) while the statistical and entropy measures (Approach 1) and direct classification (Approach 2) is used to capture the extraction features from the 1-DTS for machine learning classification into 'solar storm' and 'no storm'. We found that the potential accuracy of the model in tracking the activity of the sun is approximately 0.981 for Approach 1 and 0.999 for Approach 2. The stability of the developed approach to rotational transformation of the solar observatory image is evident. When training on the original dataset for Approach 1, the match index (T90) of the distribution of solar storm areas reaches T90 ~ 0.993, and T90 ~ 0.951 for Approach 2. In addition, when using the extended training base, the match indices increased to T90 ~ 0.994 and T90 ~ 1, respectively. This model consistently classifies areas with swirling magnetic lines associated with solar storms and is robust to image rotation, glare, and optical artifacts.  ( 3 min )
    We Should at Least Be Able to Design Molecules That Dock Well. (arXiv:2006.16955v5 [q-bio.BM] UPDATED)
    Designing compounds with desired properties is a key element of the drug discovery process. However, measuring progress in the field has been challenging due to the lack of realistic retrospective benchmarks, and the large cost of prospective validation. To close this gap, we propose a benchmark based on docking, a popular computational method for assessing molecule binding to a protein. Concretely, the goal is to generate drug-like molecules that are scored highly by SMINA, a popular docking software. We observe that popular graph-based generative models fail to generate molecules with a high docking score when trained using a realistically sized training set. This suggests a limitation of the current incarnation of models for de novo drug design. Finally, we propose a simplified version of the benchmark based on a simpler scoring function, and show that the tested models are able to partially solve it. We release the benchmark as an easy to use package available at https://github.com/cieplinski-tobiasz/smina-docking-benchmark. We hope that our benchmark will serve as a stepping stone towards the goal of automatically generating promising drug candidates.  ( 3 min )
    Detection of sepsis during emergency department triage using machine learning. (arXiv:2204.07657v6 [cs.LG] UPDATED)
    Sepsis is a life-threatening condition with organ dysfunction and is a leading cause of death and critical illness worldwide. Even a few hours of delay in the treatment of sepsis results in increased mortality. Early detection of sepsis during emergency department triage would allow early initiation of lab analysis, antibiotic administration, and other sepsis treatment protocols. The purpose of this study was to compare sepsis detection performance at ED triage (prior to the use of laboratory diagnostics) of the standard sepsis screening algorithm (SIRS with source of infection) and a machine learning algorithm trained on EHR triage data. A machine learning model (KATE Sepsis) was developed using patient encounters with triage data from 16participating hospitals. KATE Sepsis and standard screening were retrospectively evaluated on the adult population of 512,949 medical records. KATE Sepsis demonstrates an AUC of 0.9423 (0.9401 - 0.9441) with sensitivity of 71.09% (70.12% - 71.98%) and specificity of 94.81% (94.75% - 94.87%). Standard screening demonstrates an AUC of 0.6826 (0.6774 - 0.6878) with sensitivity of 40.8% (39.71% - 41.86%) and specificity of 95.72% (95.68% - 95.78%). The KATE Sepsis model trained to detect sepsis demonstrates 77.67% (75.78% -79.42%) sensitivity in detecting severe sepsis and 86.95% (84.2% - 88.81%) sensitivity in detecting septic shock. The standard screening protocol demonstrates 43.06% (41% - 45.87%) sensitivity in detecting severe sepsis and40% (36.55% - 43.26%) sensitivity in detecting septic shock. Future research should focus on the prospective impact of KATE Sepsis on administration of antibiotics, readmission rate, morbidity and mortality.  ( 3 min )
    Entropic Issues in Likelihood-Based OOD Detection. (arXiv:2109.10794v2 [stat.ML] UPDATED)
    Deep generative models trained by maximum likelihood remain very popular methods for reasoning about data probabilistically. However, it has been observed that they can assign higher likelihoods to out-of-distribution (OOD) data than in-distribution data, thus calling into question the meaning of these likelihood values. In this work we provide a novel perspective on this phenomenon, decomposing the average likelihood into a KL divergence term and an entropy term. We argue that the latter can explain the curious OOD behaviour mentioned above, suppressing likelihood values on datasets with higher entropy. Although our idea is simple, we have not seen it explored yet in the literature. This analysis provides further explanation for the success of OOD detection methods based on likelihood ratios, as the problematic entropy term cancels out in expectation. Finally, we discuss how this observation relates to recent success in OOD detection with manifold-supported models, for which the above decomposition does not hold directly.  ( 2 min )
    WARM: A Weakly (+Semi) Supervised Model for Solving Math word Problems. (arXiv:2104.06722v2 [cs.CL] UPDATED)
    Solving math word problems (MWPs) is an important and challenging problem in natural language processing. Existing approaches to solve MWPs require full supervision in the form of intermediate equations. However, labeling every MWP with its corresponding equations is a time-consuming and expensive task. In order to address this challenge of equation annotation, we propose a weakly supervised model for solving MWPs by requiring only the final answer as supervision. We approach this problem by first learning to generate the equation using the problem description and the final answer, which we subsequently use to train a supervised MWP solver. We propose and compare various weakly supervised techniques to learn to generate equations directly from the problem description and answer. Through extensive experiments, we demonstrate that without using equations for supervision, our approach achieves accuracy gains of 4.5% and 32% over the state-of-the-art weakly supervised approach, on the standard Math23K and AllArith datasets respectively. Additionally, we curate and release new datasets of roughly 10k MWPs each in English and in Hindi (a low resource language).These datasets are suitable for training weakly supervised models. We also present an extension of WARMM to semi-supervised learning and present further improvements on results, along with insights.  ( 2 min )
    $m^\ast$ of two-dimensional electron gas: a neural canonical transformation study. (arXiv:2201.03156v2 [cond-mat.stat-mech] UPDATED)
    The quasiparticle effective mass $m^\ast$ of interacting electrons is a fundamental quantity in the Fermi liquid theory. However, the precise value of the effective mass of uniform electron gas is still elusive after decades of research. The newly developed neural canonical transformation approach [Xie et al., J. Mach. Learn. 1, (2022)] offers a principled way to extract the effective mass of electron gas by directly calculating the thermal entropy at low temperature. The approach models a variational many-electron density matrix using two generative neural networks: an autoregressive model for momentum occupation and a normalizing flow for electron coordinates. Our calculation reveals a suppression of effective mass in the two-dimensional spin-polarized electron gas, which is more pronounced than previous reports in the low-density strong-coupling region. This prediction calls for verification in two-dimensional electron gas experiments.  ( 2 min )
    Short-Term Density Forecasting of Low-Voltage Load using Bernstein-Polynomial Normalizing Flows. (arXiv:2204.13939v3 [cs.LG] UPDATED)
    The transition to a fully renewable energy grid requires better forecasting of demand at the low-voltage level to increase efficiency and ensure reliable control. However, high fluctuations and increasing electrification cause huge forecast variability, not reflected in traditional point estimates. Probabilistic load forecasts take future uncertainties into account and thus allow more informed decision-making for the planning and operation of low-carbon energy systems. We propose an approach for flexible conditional density forecasting of short-term load based on Bernstein polynomial normalizing flows, where a neural network controls the parameters of the flow. In an empirical study with 363 smart meter customers, our density predictions compare favorably against Gaussian and Gaussian mixture densities. Also, they outperform a non-parametric approach based on the pinball loss for 24h-ahead load forecasting for two different neural network architectures.  ( 2 min )
    Towards Green Automated Machine Learning: Status Quo and Future Directions. (arXiv:2111.05850v4 [cs.LG] UPDATED)
    Automated machine learning (AutoML) strives for the automatic configuration of machine learning algorithms and their composition into an overall (software) solution - a machine learning pipeline - tailored to the learning task (dataset) at hand. Over the last decade, AutoML has developed into an independent research field with hundreds of contributions. At the same time, AutoML is being criticised for its high resource consumption as many approaches rely on the (costly) evaluation of many machine learning pipelines, as well as the expensive large scale experiments across many datasets and approaches. In the spirit of recent work on Green AI, this paper proposes Green AutoML, a paradigm to make the whole AutoML process more environmentally friendly. Therefore, we first elaborate on how to quantify the environmental footprint of an AutoML tool. Afterward, different strategies on how to design and benchmark an AutoML tool wrt. their "greenness", i.e. sustainability, are summarized. Finally, we elaborate on how to be transparent about the environmental footprint and what kind of research incentives could direct the community into a more sustainable AutoML research direction. Additionally, we propose a sustainability checklist to be attached to every AutoML paper featuring all core aspects of Green AutoML.  ( 3 min )
    Investigating Membership Inference Attacks under Data Dependencies. (arXiv:2010.12112v4 [cs.CR] UPDATED)
    Training machine learning models on privacy-sensitive data has become a popular practice, driving innovation in ever-expanding fields. This has opened the door to new attacks that can have serious privacy implications. One such attack, the Membership Inference Attack (MIA), exposes whether or not a particular data point was used to train a model. A growing body of literature uses Differentially Private (DP) training algorithms as a defence against such attacks. However, these works evaluate the defence under the restrictive assumption that all members of the training set, as well as non-members, are independent and identically distributed. This assumption does not hold for many real-world use cases in the literature. Motivated by this, we evaluate membership inference with statistical dependencies among samples and explain why DP does not provide meaningful protection (the privacy parameter $\epsilon$ scales with the training set size $n$) in this more general case. We conduct a series of empirical evaluations with off-the-shelf MIAs using training sets built from real-world data showing different types of dependencies among samples. Our results reveal that training set dependencies can severely increase the performance of MIAs, and therefore assuming that data samples are statistically independent can significantly underestimate the performance of MIAs.  ( 2 min )
    On the Omnipresence of Spurious Local Minima in Certain Neural Network Training Problems. (arXiv:2202.12262v2 [cs.LG] UPDATED)
    We study the loss landscape of training problems for deep artificial neural networks with a one-dimensional real output whose activation functions contain an affine segment and whose hidden layers have width at least two. It is shown that such problems possess a continuum of spurious (i.e., not globally optimal) local minima for all target functions that are not affine. In contrast to previous works, our analysis covers all sampling and parameterization regimes, general differentiable loss functions, arbitrary continuous nonpolynomial activation functions, and both the finite- and infinite-dimensional setting. It is further shown that the appearance of the spurious local minima in the considered training problems is a direct consequence of the universal approximation theorem and that the underlying mechanisms also cause, e.g., $L^p$-best approximation problems to be ill-posed in the sense of Hadamard for all networks that do not have a dense image. The latter result also holds without the assumption of local affine linearity and without any conditions on the hidden layers.  ( 2 min )
    Factorized linear discriminant analysis and its application in computational biology. (arXiv:2010.02171v5 [q-bio.QM] UPDATED)
    Navigating the complex landscape of single-cell transcriptomic data presents significant challenges. Central to this challenge is the identification of a meaningful representation of high-dimensional gene expression patterns that sheds light on the structural and functional properties of cell types. Pursuing model interpretability and computational simplicity, we often look for a linear transformation of the original data that aligns with key phenotypic features of cells. In response to this need, we introduce factorized linear discriminant analysis (FLDA), a novel method for linear dimensionality reduction. The crux of FLDA lies in identifying a linear function of gene expression levels that is highly correlated with one phenotypic feature while minimizing the influence of others. To augment this method, we integrate it with a sparsity-based regularization algorithm. This integration is crucial as it selects a subset of genes pivotal to a specific phenotypic feature or a combination thereof. To illustrate the effectiveness of FLDA, we apply it to transcriptomic datasets from neurons in the Drosophila optic lobe. We demonstrate that FLDA not only captures the inherent structural patterns aligned with phenotypic features but also uncovers key genes associated with each phenotype.  ( 2 min )
    Unraveling the ARC Puzzle: Mimicking Human Solutions with Object-Centric Decision Transformer. (arXiv:2306.08204v1 [cs.AI])
    In the pursuit of artificial general intelligence (AGI), we tackle Abstraction and Reasoning Corpus (ARC) tasks using a novel two-pronged approach. We employ the Decision Transformer in an imitation learning paradigm to model human problem-solving, and introduce an object detection algorithm, the Push and Pull clustering method. This dual strategy enhances AI's ARC problem-solving skills and provides insights for AGI progression. Yet, our work reveals the need for advanced data collection tools, robust training datasets, and refined model structures. This study highlights potential improvements for Decision Transformers and propels future AGI research.  ( 2 min )
    Contextual Font Recommendations based on User Intent. (arXiv:2306.08188v1 [cs.HC])
    Adobe Fonts has a rich library of over 20,000 unique fonts that Adobe users utilize for creating graphics, posters, composites etc. Due to the nature of the large library, knowing what font to select can be a daunting task that requires a lot of experience. For most users in Adobe products, especially casual users of Adobe Express, this often means choosing the default font instead of utilizing the rich and diverse fonts available. In this work, we create an intent-driven system to provide contextual font recommendations to users to aid in their creative journey. Our system takes in multilingual text input and recommends suitable fonts based on the user's intent. Based on user entitlements, the mix of free and paid fonts is adjusted. The feature is currently used by millions of Adobe Express users with a CTR of >25%.  ( 2 min )
    Improving Speech Enhancement Performance by Leveraging Contextual Broad Phonetic Class Information. (arXiv:2011.07442v4 [cs.SD] UPDATED)
    Previous studies have confirmed that by augmenting acoustic features with the place/manner of articulatory features, the speech enhancement (SE) process can be guided to consider the broad phonetic properties of the input speech when performing enhancement to attain performance improvements. In this paper, we explore the contextual information of articulatory attributes as additional information to further benefit SE. More specifically, we propose to improve the SE performance by leveraging losses from an end-to-end automatic speech recognition (E2E-ASR) model that predicts the sequence of broad phonetic classes (BPCs). We also developed multi-objective training with ASR and perceptual losses to train the SE system based on a BPC-based E2E-ASR. Experimental results from speech denoising, speech dereverberation, and impaired speech enhancement tasks confirmed that contextual BPC information improves SE performance. Moreover, the SE model trained with the BPC-based E2E-ASR outperforms that with the phoneme-based E2E-ASR. The results suggest that objectives with misclassification of phonemes by the ASR system may lead to imperfect feedback, and BPC could be a potentially better choice. Finally, it is noted that combining the most-confusable phonetic targets into the same BPC when calculating the additional objective can effectively improve the SE performance.  ( 3 min )
    Autoencoding Under Normalization Constraints. (arXiv:2105.05735v3 [cs.LG] UPDATED)
    Likelihood is a standard estimate for outlier detection. The specific role of the normalization constraint is to ensure that the out-of-distribution (OOD) regime has a small likelihood when samples are learned using maximum likelihood. Because autoencoders do not possess such a process of normalization, they often fail to recognize outliers even when they are obviously OOD. We propose the Normalized Autoencoder (NAE), a normalized probabilistic model constructed from an autoencoder. The probability density of NAE is defined using the reconstruction error of an autoencoder, which is differently defined in the conventional energy-based model. In our model, normalization is enforced by suppressing the reconstruction of negative samples, significantly improving the outlier detection performance. Our experimental results confirm the efficacy of NAE, both in detecting outliers and in generating in-distribution samples.  ( 2 min )
    Robust Sample Weighting to Facilitate Individualized Treatment Rule Learning for a Target Population. (arXiv:2105.00581v2 [stat.ML] UPDATED)
    Learning individualized treatment rules (ITRs) is an important topic in precision medicine. Current literature mainly focuses on deriving ITRs from a single source population. We consider the observational data setting when the source population differs from a target population of interest. Compared with causal generalization for the average treatment effect which is a scalar quantity, ITR generalization poses new challenges due to the need to model and generalize the rules based on a prespecified class of functions which may not contain the unrestricted true optimal ITR. The aim of this paper is to develop a weighting framework to mitigate the impact of such misspecification and thus facilitate the generalizability of optimal ITRs from a source population to a target population. Our method seeks covariate balance over a non-parametric function class characterized by a reproducing kernel Hilbert space and can improve many ITR learning methods that rely on weights. We show that the proposed method encompasses importance weights and overlap weights as two extreme cases, allowing for a better bias-variance trade-off in between. Numerical examples demonstrate that the use of our weighting method can greatly improve ITR estimation for the target population compared with other weighting methods.  ( 2 min )
    Differentially Private Wireless Federated Learning Using Orthogonal Sequences. (arXiv:2306.08280v1 [cs.IT])
    We propose a novel privacy-preserving uplink over-the-air computation (AirComp) method, termed FLORAS, for single-input single-output (SISO) wireless federated learning (FL) systems. From the communication design perspective, FLORAS eliminates the requirement of channel state information at the transmitters (CSIT) by leveraging the properties of orthogonal sequences. From the privacy perspective, we prove that FLORAS can offer both item-level and client-level differential privacy (DP) guarantees. Moreover, by adjusting the system parameters, FLORAS can flexibly achieve different DP levels at no additional cost. A novel FL convergence bound is derived which, combined with the privacy guarantees, allows for a smooth tradeoff between convergence rate and differential privacy levels. Numerical results demonstrate the advantages of FLORAS compared with the baseline AirComp method, and validate that our analytical results can guide the design of privacy-preserving FL with different tradeoff requirements on the model convergence and privacy levels.  ( 2 min )
    QuantumNAT: Quantum Noise-Aware Training with Noise Injection, Quantization and Normalization. (arXiv:2110.11331v4 [cs.LG] UPDATED)
    Parameterized Quantum Circuits (PQC) are promising towards quantum advantage on near-term quantum hardware. However, due to the large quantum noises (errors), the performance of PQC models has a severe degradation on real quantum devices. Take Quantum Neural Network (QNN) as an example, the accuracy gap between noise-free simulation and noisy results on IBMQ-Yorktown for MNIST-4 classification is over 60%. Existing noise mitigation methods are general ones without leveraging unique characteristics of PQC; on the other hand, existing PQC work does not consider noise effect. To this end, we present QuantumNAT, a PQC-specific framework to perform noise-aware optimizations in both training and inference stages to improve robustness. We experimentally observe that the effect of quantum noise to PQC measurement outcome is a linear map from noise-free outcome with a scaling and a shift factor. Motivated by that, we propose post-measurement normalization to mitigate the feature distribution differences between noise-free and noisy scenarios. Furthermore, to improve the robustness against noise, we propose noise injection to the training process by inserting quantum error gates to PQC according to realistic noise models of quantum hardware. Finally, post-measurement quantization is introduced to quantize the measurement outcomes to discrete values, achieving the denoising effect. Extensive experiments on 8 classification tasks using 6 quantum devices demonstrate that QuantumNAT improves accuracy by up to 43%, and achieves over 94% 2-class, 80% 4-class, and 34% 10-class classification accuracy measured on real quantum computers. The code for construction and noise-aware training of PQC is available in the TorchQuantum library.  ( 3 min )
    Learning on Graphs under Label Noise. (arXiv:2306.08194v1 [cs.LG])
    Node classification on graphs is a significant task with a wide range of applications, including social analysis and anomaly detection. Even though graph neural networks (GNNs) have produced promising results on this task, current techniques often presume that label information of nodes is accurate, which may not be the case in real-world applications. To tackle this issue, we investigate the problem of learning on graphs with label noise and develop a novel approach dubbed Consistent Graph Neural Network (CGNN) to solve it. Specifically, we employ graph contrastive learning as a regularization term, which promotes two views of augmented nodes to have consistent representations. Since this regularization term cannot utilize label information, it can enhance the robustness of node representations to label noise. Moreover, to detect noisy labels on the graph, we present a sample selection technique based on the homophily assumption, which identifies noisy nodes by measuring the consistency between the labels with their neighbors. Finally, we purify these confident noisy labels to permit efficient semantic graph learning. Extensive experiments on three well-known benchmark datasets demonstrate the superiority of our CGNN over competing approaches.  ( 2 min )
    Focusing on Potential Named Entities During Active Label Acquisition. (arXiv:2111.03837v3 [cs.CL] UPDATED)
    Named entity recognition (NER) aims to identify mentions of named entities in an unstructured text and classify them into predefined named entity classes. While deep learning-based pre-trained language models help to achieve good predictive performances in NER, many domain-specific NER applications still call for a substantial amount of labeled data. Active learning (AL), a general framework for the label acquisition problem, has been used for NER tasks to minimize the annotation cost without sacrificing model performance. However, the heavily imbalanced class distribution of tokens introduces challenges in designing effective AL querying methods for NER. We propose several AL sentence query evaluation functions that pay more attention to potential positive tokens, and evaluate these proposed functions with both sentence-based and token-based cost evaluation strategies. We also propose a better data-driven normalization approach to penalize sentences that are too long or too short. Our experiments on three datasets from different domains reveal that the proposed approach reduces the number of annotated tokens while achieving better or comparable prediction performance with conventional methods.  ( 2 min )
    MMASD: A Multimodal Dataset for Autism Intervention Analysis. (arXiv:2306.08243v1 [cs.CV])
    Autism spectrum disorder (ASD) is a developmental disorder characterized by significant social communication impairments and difficulties perceiving and presenting communication cues. Machine learning techniques have been broadly adopted to facilitate autism studies and assessments. However, computational models are primarily concentrated on specific analysis and validated on private datasets in the autism community, which limits comparisons across models due to privacy-preserving data sharing complications. This work presents a novel privacy-preserving open-source dataset, MMASD as a MultiModal ASD benchmark dataset, collected from play therapy interventions of children with Autism. MMASD includes data from 32 children with ASD, and 1,315 data samples segmented from over 100 hours of intervention recordings. To promote public access, each data sample consists of four privacy-preserving modalities of data: (1) optical flow, (2) 2D skeleton, (3) 3D skeleton, and (4) clinician ASD evaluation scores of children, e.g., ADOS scores. MMASD aims to assist researchers and therapists in understanding children's cognitive status, monitoring their progress during therapy, and customizing the treatment plan accordingly. It also has inspiration for downstream tasks such as action quality assessment and interpersonal synchrony estimation. MMASD dataset can be easily accessed at https://github.com/Li-Jicheng/MMASD-A-Multimodal-Dataset-for-Autism-Intervention-Analysis.  ( 2 min )
    Solving Large-scale Spatial Problems with Convolutional Neural Networks. (arXiv:2306.08191v1 [cs.LG])
    Over the past decade, deep learning research has been accelerated by increasingly powerful hardware, which facilitated rapid growth in the model complexity and the amount of data ingested. This is becoming unsustainable and therefore refocusing on efficiency is necessary. In this paper, we employ transfer learning to improve training efficiency for large-scale spatial problems. We propose that a convolutional neural network (CNN) can be trained on small windows of signals, but evaluated on arbitrarily large signals with little to no performance degradation, and provide a theoretical bound on the resulting generalization error. Our proof leverages shift-equivariance of CNNs, a property that is underexploited in transfer learning. The theoretical results are experimentally supported in the context of mobile infrastructure on demand (MID). The proposed approach is able to tackle MID at large scales with hundreds of agents, which was computationally intractable prior to this work.  ( 2 min )
    POP: Prompt Of Prompts for Continual Learning. (arXiv:2306.08200v1 [cs.CV])
    Continual learning (CL) has attracted increasing attention in the recent past. It aims to mimic the human ability to learn new concepts without catastrophic forgetting. While existing CL methods accomplish this to some extent, they are still prone to semantic drift of the learned feature space. Foundation models, which are endowed with a robust feature representation, learned from very large datasets, provide an interesting substrate for the solution of the CL problem. Recent work has also shown that they can be adapted to specific tasks by prompt tuning techniques that leave the generality of the representation mostly unscathed. An open question is, however, how to learn both prompts that are task specific and prompts that are global, i.e. capture cross-task information. In this work, we propose the Prompt Of Prompts (POP) model, which addresses this goal by progressively learning a group of task-specified prompts and a group of global prompts, denoted as POP, to integrate information from the former. We show that a foundation model equipped with POP learning is able to outperform classic CL methods by a significant margin. Moreover, as prompt tuning only requires a small set of training samples, POP is able to perform CL in the few-shot setting, while still outperforming competing methods trained on the entire dataset.  ( 2 min )
    Re-thinking Model Inversion Attacks Against Deep Neural Networks. (arXiv:2304.01669v2 [cs.LG] UPDATED)
    Model inversion (MI) attacks aim to infer and reconstruct private training data by abusing access to a model. MI attacks have raised concerns about the leaking of sensitive information (e.g. private face images used in training a face recognition system). Recently, several algorithms for MI have been proposed to improve the attack performance. In this work, we revisit MI, study two fundamental issues pertaining to all state-of-the-art (SOTA) MI algorithms, and propose solutions to these issues which lead to a significant boost in attack performance for all SOTA MI. In particular, our contributions are two-fold: 1) We analyze the optimization objective of SOTA MI algorithms, argue that the objective is sub-optimal for achieving MI, and propose an improved optimization objective that boosts attack performance significantly. 2) We analyze "MI overfitting", show that it would prevent reconstructed images from learning semantics of training data, and propose a novel "model augmentation" idea to overcome this issue. Our proposed solutions are simple and improve all SOTA MI attack accuracy significantly. E.g., in the standard CelebA benchmark, our solutions improve accuracy by 11.8% and achieve for the first time over 90% attack accuracy. Our findings demonstrate that there is a clear risk of leaking sensitive information from deep learning models. We urge serious consideration to be given to the privacy implications. Our code, demo, and models are available at https://ngoc-nguyen-0.github.io/re-thinking_model_inversion_attacks/  ( 3 min )
    Fast exploration and learning of latent graphs with aliased observations. (arXiv:2303.07397v3 [cs.LG] UPDATED)
    We consider the problem of recovering a latent graph where the observations at each node are \emph{aliased}, and transitions are stochastic. Observations are gathered by an agent traversing the graph. Aliasing means that multiple nodes emit the same observation, so the agent can not know in which node it is located. The agent needs to uncover the hidden topology as accurately as possible and in as few steps as possible. This is equivalent to efficient recovery of the transition probabilities of a partially observable Markov decision process (POMDP) in which the observation probabilities are known. An algorithm for efficiently exploring (and ultimately recovering) the latent graph is provided. Our approach is exponentially faster than naive exploration in a variety of challenging topologies with aliased observations while remaining competitive with existing baselines in the unaliased regime.  ( 2 min )
    Prompt-Augmented Linear Probing: Scaling beyond the Limit of Few-shot In-Context Learners. (arXiv:2212.10873v3 [cs.CL] UPDATED)
    Through in-context learning (ICL), large-scale language models are effective few-shot learners without additional model fine-tuning. However, the ICL performance does not scale well with the number of available training samples as it is limited by the inherent input length constraint of the underlying language model. Meanwhile, many studies have revealed that language models are also powerful feature extractors, allowing them to be utilized in a black-box manner and enabling the linear probing paradigm, where lightweight discriminators are trained on top of the pre-extracted input representations. This paper proposes prompt-augmented linear probing (PALP), a hybrid of linear probing and ICL, which leverages the best of both worlds. PALP inherits the scalability of linear probing and the capability of enforcing language models to derive more meaningful representations via tailoring input into a more conceivable form. Throughout in-depth investigations on various datasets, we verified that PALP significantly enhances the input representations closing the gap between ICL in the data-hungry scenario and fine-tuning in the data-abundant scenario with little training overhead, potentially making PALP a strong alternative in a black-box scenario.  ( 2 min )
    Machine-Learned Premise Selection for Lean. (arXiv:2304.00994v2 [cs.AI] UPDATED)
    We introduce a machine-learning-based tool for the Lean proof assistant that suggests relevant premises for theorems being proved by a user. The design principles for the tool are (1) tight integration with the proof assistant, (2) ease of use and installation, (3) a lightweight and fast approach. For this purpose, we designed a custom version of the random forest model, trained in an online fashion. It is implemented directly in Lean, which was possible thanks to the rich and efficient metaprogramming features of Lean 4. The random forest is trained on data extracted from mathlib -- Lean's mathematics library. We experiment with various options for producing training features and labels. The advice from a trained model is accessible to the user via the suggest_premises tactic which can be called in an editor while constructing a proof interactively.  ( 2 min )
    Contextual Multilingual Spellchecker for User Queries. (arXiv:2305.01082v2 [cs.CL] UPDATED)
    Spellchecking is one of the most fundamental and widely used search features. Correcting incorrectly spelled user queries not only enhances the user experience but is expected by the user. However, most widely available spellchecking solutions are either lower accuracy than state-of-the-art solutions or too slow to be used for search use cases where latency is a key requirement. Furthermore, most innovative recent architectures focus on English and are not trained in a multilingual fashion and are trained for spell correction in longer text, which is a different paradigm from spell correction for user queries, where context is sparse (most queries are 1-2 words long). Finally, since most enterprises have unique vocabularies such as product names, off-the-shelf spelling solutions fall short of users' needs. In this work, we build a multilingual spellchecker that is extremely fast and scalable and that adapts its vocabulary and hence speller output based on a specific product's needs. Furthermore, our speller out-performs general purpose spellers by a wide margin on in-domain datasets. Our multilingual speller is used in search in Adobe products, powering autocomplete in various applications.  ( 2 min )
    Scalable Adaptive Computation for Iterative Generation. (arXiv:2212.11972v2 [cs.LG] UPDATED)
    Natural data is redundant yet predominant architectures tile computation uniformly across their input and output space. We propose the Recurrent Interface Networks (RINs), an attention-based architecture that decouples its core computation from the dimensionality of the data, enabling adaptive computation for more scalable generation of high-dimensional data. RINs focus the bulk of computation (i.e. global self-attention) on a set of latent tokens, using cross-attention to read and write (i.e. route) information between latent and data tokens. Stacking RIN blocks allows bottom-up (data to latent) and top-down (latent to data) feedback, leading to deeper and more expressive routing. While this routing introduces challenges, this is less problematic in recurrent computation settings where the task (and routing problem) changes gradually, such as iterative generation with diffusion models. We show how to leverage recurrence by conditioning the latent tokens at each forward pass of the reverse diffusion process with those from prior computation, i.e. latent self-conditioning. RINs yield state-of-the-art pixel diffusion models for image and video generation, scaling to 1024X1024 images without cascades or guidance, while being domain-agnostic and up to 10X more efficient than 2D and 3D U-Nets.  ( 2 min )
    B-BACN: Bayesian Boundary-Aware Convolutional Network for Crack Characterization. (arXiv:2302.06827v3 [cs.CV] UPDATED)
    Accurately detecting crack boundaries is crucial for reliability assessment and risk management of structures and materials, such as structural health monitoring, diagnostics, prognostics, and maintenance scheduling. Uncertainty quantification of crack detection is challenging due to various stochastic factors, such as measurement noises, signal processing, and model simplifications. A machine learning-based approach is proposed to quantify both epistemic and aleatoric uncertainties concurrently. We introduce a Bayesian Boundary-Aware Convolutional Network (B-BACN) that emphasizes uncertainty-aware boundary refinement to generate precise and reliable crack boundary detections. The proposed method employs a multi-task learning approach, where we use Monte Carlo Dropout to learn the epistemic uncertainty and a Gaussian sampling function to predict each sample's aleatoric uncertainty. Moreover, we include a boundary refinement loss to B-BACN to enhance the determination of defect boundaries. The proposed method is demonstrated with benchmark experimental results and compared with several existing methods. The experimental results illustrate the effectiveness of our proposed approach in uncertainty-aware crack boundary detection, minimizing misclassification rate, and improving model calibration capabilities.  ( 2 min )
    MedNeXt: Transformer-driven Scaling of ConvNets for Medical Image Segmentation. (arXiv:2303.09975v3 [eess.IV] UPDATED)
    There has been exploding interest in embracing Transformer-based architectures for medical image segmentation. However, the lack of large-scale annotated medical datasets make achieving performances equivalent to those in natural images challenging. Convolutional networks, in contrast, have higher inductive biases and consequently, are easily trainable to high performance. Recently, the ConvNeXt architecture attempted to modernize the standard ConvNet by mirroring Transformer blocks. In this work, we improve upon this to design a modernized and scalable convolutional architecture customized to challenges of data-scarce medical settings. We introduce MedNeXt, a Transformer-inspired large kernel segmentation network which introduces - 1) A fully ConvNeXt 3D Encoder-Decoder Network for medical image segmentation, 2) Residual ConvNeXt up and downsampling blocks to preserve semantic richness across scales, 3) A novel technique to iteratively increase kernel sizes by upsampling small kernel networks, to prevent performance saturation on limited medical data, 4) Compound scaling at multiple levels (depth, width, kernel size) of MedNeXt. This leads to state-of-the-art performance on 4 tasks on CT and MRI modalities and varying dataset sizes, representing a modernized deep architecture for medical image segmentation. Our code is made publicly available at: https://github.com/MIC-DKFZ/MedNeXt.  ( 2 min )
    The Training Process of Many Deep Networks Explores the Same Low-Dimensional Manifold. (arXiv:2305.01604v2 [cs.LG] UPDATED)
    We develop information-geometric techniques to analyze the trajectories of the predictions of deep networks during training. By examining the underlying high-dimensional probabilistic models, we reveal that the training process explores an effectively low-dimensional manifold. Networks with a wide range of architectures, sizes, trained using different optimization methods, regularization techniques, data augmentation techniques, and weight initializations lie on the same manifold in the prediction space. We study the details of this manifold to find that networks with different architectures follow distinguishable trajectories but other factors have a minimal influence; larger networks train along a similar manifold as that of smaller networks, just faster; and networks initialized at very different parts of the prediction space converge to the solution along a similar manifold.  ( 2 min )
    B-Learner: Quasi-Oracle Bounds on Heterogeneous Causal Effects Under Hidden Confounding. (arXiv:2304.10577v2 [cs.LG] UPDATED)
    Estimating heterogeneous treatment effects from observational data is a crucial task across many fields, helping policy and decision-makers take better actions. There has been recent progress on robust and efficient methods for estimating the conditional average treatment effect (CATE) function, but these methods often do not take into account the risk of hidden confounding, which could arbitrarily and unknowingly bias any causal estimate based on observational data. We propose a meta-learner called the B-Learner, which can efficiently learn sharp bounds on the CATE function under limits on the level of hidden confounding. We derive the B-Learner by adapting recent results for sharp and valid bounds of the average treatment effect (Dorn et al., 2021) into the framework given by Kallus & Oprescu (2023) for robust and model-agnostic learning of conditional distributional treatment effects. The B-Learner can use any function estimator such as random forests and deep neural networks, and we prove its estimates are valid, sharp, efficient, and have a quasi-oracle property with respect to the constituent estimators under more general conditions than existing methods. Semi-synthetic experimental comparisons validate the theoretical findings, and we use real-world data to demonstrate how the method might be used in practice.  ( 2 min )
    Pretraining Language Models with Human Preferences. (arXiv:2302.08582v2 [cs.CL] UPDATED)
    Language models (LMs) are pretrained to imitate internet text, including content that would violate human preferences if generated by an LM: falsehoods, offensive comments, personally identifiable information, low-quality or buggy code, and more. Here, we explore alternative objectives for pretraining LMs in a way that also guides them to generate text aligned with human preferences. We benchmark five objectives for pretraining with human feedback across three tasks and study how they affect the trade-off between alignment and capabilities of pretrained LMs. We find a Pareto-optimal and simple approach among those we explored: conditional training, or learning distribution over tokens conditional on their human preference scores given by a reward model. Conditional training reduces the rate of undesirable content by up to an order of magnitude, both when generating without a prompt and with an adversarially-chosen prompt. Moreover, conditional training maintains the downstream task performance of standard LM pretraining, both before and after task-specific finetuning. Pretraining with human feedback results in much better preference satisfaction than standard LM pretraining followed by finetuning with feedback, i.e., learning and then unlearning undesirable behavior. Our results suggest that we should move beyond imitation learning when pretraining LMs and incorporate human preferences from the start of training.  ( 2 min )
    Towards Model-Agnostic Federated Learning over Networks. (arXiv:2302.04363v2 [cs.LG] UPDATED)
    We present a model-agnostic federated learning method for networks of heterogeneous data and models. The network structure reflects similarities between the (statistics of) local datasets and, in turn, their associated local("personal") models. Our method is an instance of empirical risk minimization, with the regularization term derived from the network structure of data. In particular, we require well-connected local models, forming clusters, to yield similar predictions on a common test set. The proposed method allows for a wide range of local models. The only restriction on these local models is that they allow for efficient implementation of regularized empirical risk minimization (training). For a wide range of models, such implementations are available in high-level programming libraries including scikit-learn, Keras or PyTorch.  ( 2 min )
  • Open

    Kernel Debiased Plug-in Estimation. (arXiv:2306.08598v1 [stat.ME])
    We consider the problem of estimating a scalar target parameter in the presence of nuisance parameters. Replacing the unknown nuisance parameter with a nonparametric estimator, e.g.,a machine learning (ML) model, is convenient but has shown to be inefficient due to large biases. Modern methods, such as the targeted minimum loss-based estimation (TMLE) and double machine learning (DML), achieve optimal performance under flexible assumptions by harnessing ML estimates while mitigating the plug-in bias. To avoid a sub-optimal bias-variance trade-off, these methods perform a debiasing step of the plug-in pre-estimate. Existing debiasing methods require the influence function of the target parameter as input. However, deriving the IF requires specialized expertise and thus obstructs the adaptation of these methods by practitioners. We propose a novel way to debias plug-in estimators which (i) is efficient, (ii) does not require the IF to be implemented, (iii) is computationally tractable, and therefore can be readily adapted to new estimation problems and automated without analytic derivations by the user. We build on the TMLE framework and update a plug-in estimate with a regularized likelihood maximization step over a nonparametric model constructed with a reproducing kernel Hilbert space (RKHS), producing an efficient plug-in estimate for any regular target parameter. Our method, thus, offers the efficiency of competing debiasing techniques without sacrificing the utility of the plug-in approach.  ( 2 min )
    Robust Sample Weighting to Facilitate Individualized Treatment Rule Learning for a Target Population. (arXiv:2105.00581v2 [stat.ML] UPDATED)
    Learning individualized treatment rules (ITRs) is an important topic in precision medicine. Current literature mainly focuses on deriving ITRs from a single source population. We consider the observational data setting when the source population differs from a target population of interest. Compared with causal generalization for the average treatment effect which is a scalar quantity, ITR generalization poses new challenges due to the need to model and generalize the rules based on a prespecified class of functions which may not contain the unrestricted true optimal ITR. The aim of this paper is to develop a weighting framework to mitigate the impact of such misspecification and thus facilitate the generalizability of optimal ITRs from a source population to a target population. Our method seeks covariate balance over a non-parametric function class characterized by a reproducing kernel Hilbert space and can improve many ITR learning methods that rely on weights. We show that the proposed method encompasses importance weights and overlap weights as two extreme cases, allowing for a better bias-variance trade-off in between. Numerical examples demonstrate that the use of our weighting method can greatly improve ITR estimation for the target population compared with other weighting methods.  ( 2 min )
    Leveraging Evolutionary Changes for Software Process Quality. (arXiv:2305.18061v2 [cs.SE] UPDATED)
    Real-world software applications must constantly evolve to remain relevant. This evolution occurs when developing new applications or adapting existing ones to meet new requirements, make corrections, or incorporate future functionality. Traditional methods of software quality control involve software quality models and continuous code inspection tools. These measures focus on directly assessing the quality of the software. However, there is a strong correlation and causation between the quality of the development process and the resulting software product. Therefore, improving the development process indirectly improves the software product, too. To achieve this, effective learning from past processes is necessary, often embraced through post mortem organizational learning. While qualitative evaluation of large artifacts is common, smaller quantitative changes captured by application lifecycle management are often overlooked. In addition to software metrics, these smaller changes can reveal complex phenomena related to project culture and management. Leveraging these changes can help detect and address such complex issues. Software evolution was previously measured by the size of changes, but the lack of consensus on a reliable and versatile quantification method prevents its use as a dependable metric. Different size classifications fail to reliably describe the nature of evolution. While application lifecycle management data is rich, identifying which artifacts can model detrimental managerial practices remains uncertain. Approaches such as simulation modeling, discrete events simulation, or Bayesian networks have only limited ability to exploit continuous-time process models of such phenomena. Even worse, the accessibility and mechanistic insight into such gray- or black-box models are typically very low. To address these challenges, we suggest leveraging objectively [...]  ( 3 min )
    Multi-Loss Convolutional Network with Time-Frequency Attention for Speech Enhancement. (arXiv:2306.08956v1 [cs.SD])
    The Dual-Path Convolution Recurrent Network (DPCRN) was proposed to effectively exploit time-frequency domain information. By combining the DPRNN module with Convolution Recurrent Network (CRN), the DPCRN obtained a promising performance in speech separation with a limited model size. In this paper, we explore self-attention in the DPCRN module and design a model called Multi-Loss Convolutional Network with Time-Frequency Attention(MNTFA) for speech enhancement. We use self-attention modules to exploit the long-time information, where the intra-chunk self-attentions are used to model the spectrum pattern and the inter-chunk self-attention are used to model the dependence between consecutive frames. Compared to DPRNN, axial self-attention greatly reduces the need for memory and computation, which is more suitable for long sequences of speech signals. In addition, we propose a joint training method of a multi-resolution STFT loss and a WavLM loss using a pre-trained WavLM network. Experiments show that with only 0.23M parameters, the proposed model achieves a better performance than DPCRN.  ( 2 min )
    Revisiting the Gumbel-Softmax in MADDPG. (arXiv:2302.11793v2 [cs.LG] UPDATED)
    MADDPG is an algorithm in multi-agent reinforcement learning (MARL) that extends the popular single-agent method, DDPG, to multi-agent scenarios. Importantly, DDPG is an algorithm designed for continuous action spaces, where the gradient of the state-action value function exists. For this algorithm to work in discrete action spaces, discrete gradient estimation must be performed. For MADDPG, the Gumbel-Softmax (GS) estimator is used -- a reparameterisation which relaxes a discrete distribution into a similar continuous one. This method, however, is statistically biased, and a recent MARL benchmarking paper suggests that this bias makes MADDPG perform poorly in grid-world situations, where the action space is discrete. Fortunately, many alternatives to the GS exist, boasting a wide range of properties. This paper explores several of these alternatives and integrates them into MADDPG for discrete grid-world scenarios. The corresponding impact on various performance metrics is then measured and analysed. It is found that one of the proposed estimators performs significantly better than the original GS in several tasks, achieving up to 55% higher returns, along with faster convergence.  ( 2 min )
    Optimistic Planning by Regularized Dynamic Programming. (arXiv:2302.14004v3 [cs.LG] UPDATED)
    We propose a new method for optimistic planning in infinite-horizon discounted Markov decision processes based on the idea of adding regularization to the updates of an otherwise standard approximate value iteration procedure. This technique allows us to avoid contraction and monotonicity arguments typically required by existing analyses of approximate dynamic programming methods, and in particular to use approximate transition functions estimated via least-squares procedures in MDPs with linear function approximation. We use our method to recover known guarantees in tabular MDPs and to provide a computationally efficient algorithm for learning near-optimal policies in discounted linear mixture MDPs from a single stream of experience, and show it achieves near-optimal statistical guarantees.  ( 2 min )
    Contextual Combinatorial Bandits with Probabilistically Triggered Arms. (arXiv:2303.17110v2 [cs.LG] UPDATED)
    We study contextual combinatorial bandits with probabilistically triggered arms (C$^2$MAB-T) under a variety of smoothness conditions that capture a wide range of applications, such as contextual cascading bandits and contextual influence maximization bandits. Under the triggering probability modulated (TPM) condition, we devise the C$^2$-UCB-T algorithm and propose a novel analysis that achieves an $\tilde{O}(d\sqrt{KT})$ regret bound, removing a potentially exponentially large factor $O(1/p_{\min})$, where $d$ is the dimension of contexts, $p_{\min}$ is the minimum positive probability that any arm can be triggered, and batch-size $K$ is the maximum number of arms that can be triggered per round. Under the variance modulated (VM) or triggering probability and variance modulated (TPVM) conditions, we propose a new variance-adaptive algorithm VAC$^2$-UCB and derive a regret bound $\tilde{O}(d\sqrt{T})$, which is independent of the batch-size $K$. As a valuable by-product, our analysis technique and variance-adaptive algorithm can be applied to the CMAB-T and C$^2$MAB setting, improving existing results there as well. We also include experiments that demonstrate the improved performance of our algorithms compared with benchmark algorithms on synthetic and real-world datasets.
    Optimal Best-Arm Identification in Bandits with Access to Offline Data. (arXiv:2306.09048v1 [cs.LG])
    Learning paradigms based purely on offline data as well as those based solely on sequential online learning have been well-studied in the literature. In this paper, we consider combining offline data with online learning, an area less studied but of obvious practical importance. We consider the stochastic $K$-armed bandit problem, where our goal is to identify the arm with the highest mean in the presence of relevant offline data, with confidence $1-\delta$. We conduct a lower bound analysis on policies that provide such $1-\delta$ probabilistic correctness guarantees. We develop algorithms that match the lower bound on sample complexity when $\delta$ is small. Our algorithms are computationally efficient with an average per-sample acquisition cost of $\tilde{O}(K)$, and rely on a careful characterization of the optimality conditions of the lower bound problem.
    Optimal Exploration for Model-Based RL in Nonlinear Systems. (arXiv:2306.09210v1 [cs.LG])
    Learning to control unknown nonlinear dynamical systems is a fundamental problem in reinforcement learning and control theory. A commonly applied approach is to first explore the environment (exploration), learn an accurate model of it (system identification), and then compute an optimal controller with the minimum cost on this estimated system (policy optimization). While existing work has shown that it is possible to learn a uniformly good model of the system~\citep{mania2020active}, in practice, if we aim to learn a good controller with a low cost on the actual system, certain system parameters may be significantly more critical than others, and we therefore ought to focus our exploration on learning such parameters. In this work, we consider the setting of nonlinear dynamical systems and seek to formally quantify, in such settings, (a) which parameters are most relevant to learning a good controller, and (b) how we can best explore so as to minimize uncertainty in such parameters. Inspired by recent work in linear systems~\citep{wagenmaker2021task}, we show that minimizing the controller loss in nonlinear systems translates to estimating the system parameters in a particular, task-dependent metric. Motivated by this, we develop an algorithm able to efficiently explore the system to reduce uncertainty in this metric, and prove a lower bound showing that our approach learns a controller at a near-instance-optimal rate. Our algorithm relies on a general reduction from policy optimization to optimal experiment design in arbitrary systems, and may be of independent interest. We conclude with experiments demonstrating the effectiveness of our method in realistic nonlinear robotic systems.
    Large language models predict human sensory judgments across six modalities. (arXiv:2302.01308v2 [cs.CL] UPDATED)
    Determining the extent to which the perceptual world can be recovered from language is a longstanding problem in philosophy and cognitive science. We show that state-of-the-art large language models can unlock new insights into this problem by providing a lower bound on the amount of perceptual information that can be extracted from language. Specifically, we elicit pairwise similarity judgments from GPT models across six psychophysical datasets. We show that the judgments are significantly correlated with human data across all domains, recovering well-known representations like the color wheel and pitch spiral. Surprisingly, we find that a model (GPT-4) co-trained on vision and language does not necessarily lead to improvements specific to the visual modality. To study the influence of specific languages on perception, we also apply the models to a multilingual color-naming task. We find that GPT-4 replicates cross-linguistic variation in English and Russian illuminating the interaction of language and perception.
    Investigating the Impact of Model Width and Density on Generalization in Presence of Label Noise. (arXiv:2208.08003v4 [cs.LG] UPDATED)
    Increasing the size of overparameterized neural networks has been a key in achieving state-of-the-art performance. This is captured by the double descent phenomenon, where the test loss follows a decreasing-increasing-decreasing pattern as model width increases. However, the effect of label noise on the test loss curve has not been fully explored. In this work, we uncover an intriguing phenomenon where label noise leads to a \textit{final ascent} in the originally observed double descent curve. Specifically, under a sufficiently large noise-to-sample-size ratio, optimal generalization is achieved at intermediate widths. Through theoretical analysis, we attribute this phenomenon to the shape transition of test loss variance induced by label noise. Furthermore, we extend the final ascent phenomenon to model density and provide the first theoretical characterization showing that reducing density by randomly dropping trainable parameters improves generalization under label noise. We also thoroughly examine the roles of regularization and sample size. Surprisingly, we find that larger $\ell_2$ regularization and robust learning methods against label noise exacerbate the final ascent. We confirm the validity of our findings through extensive experiments on ReLu networks trained on MNIST, ResNets trained on CIFAR-10/100, and InceptionResNet-v2 trained on Stanford Cars with real-world noisy labels.
    Langevin Thompson Sampling with Logarithmic Communication: Bandits and Reinforcement Learning. (arXiv:2306.08803v1 [cs.LG])
    Thompson sampling (TS) is widely used in sequential decision making due to its ease of use and appealing empirical performance. However, many existing analytical and empirical results for TS rely on restrictive assumptions on reward distributions, such as belonging to conjugate families, which limits their applicability in realistic scenarios. Moreover, sequential decision making problems are often carried out in a batched manner, either due to the inherent nature of the problem or to serve the purpose of reducing communication and computation costs. In this work, we jointly study these problems in two popular settings, namely, stochastic multi-armed bandits (MABs) and infinite-horizon reinforcement learning (RL), where TS is used to learn the unknown reward distributions and transition dynamics, respectively. We propose batched $\textit{Langevin Thompson Sampling}$ algorithms that leverage MCMC methods to sample from approximate posteriors with only logarithmic communication costs in terms of batches. Our algorithms are computationally efficient and maintain the same order-optimal regret guarantees of $\mathcal{O}(\log T)$ for stochastic MABs, and $\mathcal{O}(\sqrt{T})$ for RL. We complement our theoretical findings with experimental results.
    Visualizing Deep Neural Networks with Topographic Activation Maps. (arXiv:2204.03528v2 [cs.LG] UPDATED)
    Machine Learning with Deep Neural Networks (DNNs) has become a successful tool in solving tasks across various fields of application. However, the complexity of DNNs makes it difficult to understand how they solve their learned task. To improve the explainability of DNNs, we adapt methods from neuroscience that analyze complex and opaque systems. Here, we draw inspiration from how neuroscience uses topographic maps to visualize brain activity. To also visualize activations of neurons in DNNs as topographic maps, we research techniques to layout the neurons in a two-dimensional space such that neurons of similar activity are in the vicinity of each other. In this work, we introduce and compare methods to obtain a topographic layout of neurons in a DNN layer. Moreover, we demonstrate how to use topographic activation maps to identify errors or encoded biases and to visualize training processes. Our novel visualization technique improves the transparency of DNN-based decision-making systems and is interpretable without expert knowledge in Machine Learning.
    Spherical Inducing Features for Orthogonally-Decoupled Gaussian Processes. (arXiv:2304.14034v2 [cs.LG] UPDATED)
    Despite their many desirable properties, Gaussian processes (GPs) are often compared unfavorably to deep neural networks (NNs) for lacking the ability to learn representations. Recent efforts to bridge the gap between GPs and deep NNs have yielded a new class of inter-domain variational GPs in which the inducing variables correspond to hidden units of a feedforward NN. In this work, we examine some practical issues associated with this approach and propose an extension that leverages the orthogonal decomposition of GPs to mitigate these limitations. In particular, we introduce spherical inter-domain features to construct more flexible data-dependent basis functions for both the principal and orthogonal components of the GP approximation and show that incorporating NN activation features under this framework not only alleviates these shortcomings but is more scalable than alternative strategies. Experiments on multiple benchmark datasets demonstrate the effectiveness of our approach.
    The Neural Covariance SDE: Shaped Infinite Depth-and-Width Networks at Initialization. (arXiv:2206.02768v3 [stat.ML] UPDATED)
    The logit outputs of a feedforward neural network at initialization are conditionally Gaussian, given a random covariance matrix defined by the penultimate layer. In this work, we study the distribution of this random matrix. Recent work has shown that shaping the activation function as network depth grows large is necessary for this covariance matrix to be non-degenerate. However, the current infinite-width-style understanding of this shaping method is unsatisfactory for large depth: infinite-width analyses ignore the microscopic fluctuations from layer to layer, but these fluctuations accumulate over many layers. To overcome this shortcoming, we study the random covariance matrix in the shaped infinite-depth-and-width limit. We identify the precise scaling of the activation function necessary to arrive at a non-trivial limit, and show that the random covariance matrix is governed by a stochastic differential equation (SDE) that we call the Neural Covariance SDE. Using simulations, we show that the SDE closely matches the distribution of the random covariance matrix of finite networks. Additionally, we recover an if-and-only-if condition for exploding and vanishing norms of large shaped networks based on the activation function.
    Differentially Private Domain Adaptation with Theoretical Guarantees. (arXiv:2306.08838v1 [cs.LG])
    In many applications, the labeled data at the learner's disposal is subject to privacy constraints and is relatively limited. To derive a more accurate predictor for the target domain, it is often beneficial to leverage publicly available labeled data from an alternative domain, somewhat close to the target domain. This is the modern problem of supervised domain adaptation from a public source to a private target domain. We present two $(\epsilon, \delta)$-differentially private adaptation algorithms for supervised adaptation, for which we make use of a general optimization problem, recently shown to benefit from favorable theoretical learning guarantees. Our first algorithm is designed for regression with linear predictors and shown to solve a convex optimization problem. Our second algorithm is a more general solution for loss functions that may be non-convex but Lipschitz and smooth. While our main objective is a theoretical analysis, we also report the results of several experiments first demonstrating that the non-private versions of our algorithms outperform adaptation baselines and next showing that, for larger values of the target sample size or $\epsilon$, the performance of our private algorithms remains close to that of the non-private formulation.
    Short-Term Density Forecasting of Low-Voltage Load using Bernstein-Polynomial Normalizing Flows. (arXiv:2204.13939v3 [cs.LG] UPDATED)
    The transition to a fully renewable energy grid requires better forecasting of demand at the low-voltage level to increase efficiency and ensure reliable control. However, high fluctuations and increasing electrification cause huge forecast variability, not reflected in traditional point estimates. Probabilistic load forecasts take future uncertainties into account and thus allow more informed decision-making for the planning and operation of low-carbon energy systems. We propose an approach for flexible conditional density forecasting of short-term load based on Bernstein polynomial normalizing flows, where a neural network controls the parameters of the flow. In an empirical study with 363 smart meter customers, our density predictions compare favorably against Gaussian and Gaussian mixture densities. Also, they outperform a non-parametric approach based on the pinball loss for 24h-ahead load forecasting for two different neural network architectures.
    We Should at Least Be Able to Design Molecules That Dock Well. (arXiv:2006.16955v5 [q-bio.BM] UPDATED)
    Designing compounds with desired properties is a key element of the drug discovery process. However, measuring progress in the field has been challenging due to the lack of realistic retrospective benchmarks, and the large cost of prospective validation. To close this gap, we propose a benchmark based on docking, a popular computational method for assessing molecule binding to a protein. Concretely, the goal is to generate drug-like molecules that are scored highly by SMINA, a popular docking software. We observe that popular graph-based generative models fail to generate molecules with a high docking score when trained using a realistically sized training set. This suggests a limitation of the current incarnation of models for de novo drug design. Finally, we propose a simplified version of the benchmark based on a simpler scoring function, and show that the tested models are able to partially solve it. We release the benchmark as an easy to use package available at https://github.com/cieplinski-tobiasz/smina-docking-benchmark. We hope that our benchmark will serve as a stepping stone towards the goal of automatically generating promising drug candidates.
    Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks. (arXiv:2202.00293v4 [stat.ML] UPDATED)
    Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achieve global convergence under gradient descent. The picture can be radically different for narrow networks, which tend to get stuck in badly-generalizing local minima. Here we investigate the cross-over between these two regimes in the high-dimensional setting, and in particular investigate the connection between the so-called mean-field/hydrodynamic regime and the seminal approach of Saad & Solla. Focusing on the case of Gaussian data, we study the interplay between the learning rate, the time scale, and the number of hidden units in the high-dimensional dynamics of stochastic gradient descent (SGD). Our work builds on a deterministic description of SGD in high-dimensions from statistical physics, which we extend and for which we provide rigorous convergence rates.
    Intrinsic Sliced Wasserstein Distances for Comparing Collections of Probability Distributions on Manifolds and Graphs. (arXiv:2010.15285v3 [stat.ME] UPDATED)
    Collections of probability distributions arise in a variety of applications ranging from user activity pattern analysis to brain connectomics. In practice these distributions can be defined over diverse domain types including finite intervals, circles, cylinders, spheres, other manifolds, and graphs. This paper introduces an approach for detecting differences between two collections of distributions over such general domains. To this end, we propose the intrinsic slicing construction that yields a novel class of Wasserstein distances on manifolds and graphs. These distances are Hilbert embeddable, allowing us to reduce the distribution collection comparison problem to a more familiar mean testing problem in a Hilbert space. We provide two testing procedures one based on resampling and another on combining p-values from coordinate-wise tests. Our experiments in various synthetic and real data settings show that the resulting tests are powerful and the p-values are well-calibrated.
    Analysis and Approximate Inference of Large and Dense Random Kronecker Graphs. (arXiv:2306.08489v1 [stat.ML])
    Random graph models are playing an increasingly important role in science and industry, and finds their applications in a variety of fields ranging from social and traffic networks, to recommendation systems and molecular genetics. In this paper, we perform an in-depth analysis of the random Kronecker graph model proposed in \cite{leskovec2010kronecker}, when the number of graph vertices $N$ is large. Built upon recent advances in random matrix theory, we show, in the dense regime, that the random Kronecker graph adjacency matrix follows approximately a signal-plus-noise model, with a small-rank (of order at most $\log N$) signal matrix that is linear in the graph parameters and a random noise matrix having a quarter-circle-form singular value distribution. This observation allows us to propose a ``denoise-and-solve'' meta algorithm to approximately infer the graph parameters, with reduced computational complexity and (asymptotic) performance guarantee. Numerical experiments of graph inference and graph classification on both synthetic and realistic graphs are provided to support the advantageous performance of the proposed approach.
    Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative Models. (arXiv:2306.09251v1 [stat.ML])
    Diffusion models, which convert noise into new data instances by learning to reverse a Markov diffusion process, have become a cornerstone in contemporary generative modeling. While their practical power has now been widely recognized, the theoretical underpinnings remain far from mature. In this work, we develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models in discrete time, assuming access to reliable estimates of the (Stein) score functions. For a popular deterministic sampler (based on the probability flow ODE), we establish a convergence rate proportional to $1/T$ (with $T$ the total number of steps), improving upon past results; for another mainstream stochastic sampler (i.e., a type of the denoising diffusion probabilistic model (DDPM)), we derive a convergence rate proportional to $1/\sqrt{T}$, matching the state-of-the-art theory. Our theory imposes only minimal assumptions on the target data distribution (e.g., no smoothness assumption is imposed), and is developed based on an elementary yet versatile non-asymptotic approach without resorting to toolboxes for SDEs and ODEs. Further, we design two accelerated variants, improving the convergence to $1/T^2$ for the ODE-based sampler and $1/T$ for the DDPM-type sampler, which might be of independent theoretical and empirical interest.
    Bootstrap aggregation and confidence measures to improve time series causal discovery. (arXiv:2306.08946v1 [stat.ME])
    Causal discovery methods have demonstrated the ability to identify the time series graphs representing the causal temporal dependency structure of dynamical systems. However, they do not include a measure of the confidence of the estimated links. Here, we introduce a novel bootstrap aggregation (bagging) and confidence measure method that is combined with time series causal discovery. This new method allows measuring confidence for the links of the time series graphs calculated by causal discovery methods. This is done by bootstrapping the original times series data set while preserving temporal dependencies. Next to confidence measures, aggregating the bootstrapped graphs by majority voting yields a final aggregated output graph. In this work, we combine our approach with the state-of-the-art conditional-independence-based algorithm PCMCI+. With extensive numerical experiments we empirically demonstrate that, in addition to providing confidence measures for links, Bagged-PCMCI+ improves the precision and recall of its base algorithm PCMCI+. Specifically, Bagged-PCMCI+ has a higher detection power regarding adjacencies and a higher precision in orienting contemporaneous edges while at the same time showing a lower rate of false positives. These performance improvements are especially pronounced in the more challenging settings (short time sample size, large number of variables, high autocorrelation). Our bootstrap approach can also be combined with other time series causal discovery algorithms and can be of considerable use in many real-world applications, especially when confidence measures for the links are desired.
    B-Learner: Quasi-Oracle Bounds on Heterogeneous Causal Effects Under Hidden Confounding. (arXiv:2304.10577v2 [cs.LG] UPDATED)
    Estimating heterogeneous treatment effects from observational data is a crucial task across many fields, helping policy and decision-makers take better actions. There has been recent progress on robust and efficient methods for estimating the conditional average treatment effect (CATE) function, but these methods often do not take into account the risk of hidden confounding, which could arbitrarily and unknowingly bias any causal estimate based on observational data. We propose a meta-learner called the B-Learner, which can efficiently learn sharp bounds on the CATE function under limits on the level of hidden confounding. We derive the B-Learner by adapting recent results for sharp and valid bounds of the average treatment effect (Dorn et al., 2021) into the framework given by Kallus & Oprescu (2023) for robust and model-agnostic learning of conditional distributional treatment effects. The B-Learner can use any function estimator such as random forests and deep neural networks, and we prove its estimates are valid, sharp, efficient, and have a quasi-oracle property with respect to the constituent estimators under more general conditions than existing methods. Semi-synthetic experimental comparisons validate the theoretical findings, and we use real-world data to demonstrate how the method might be used in practice.
    Logarithmic Bayes Regret Bounds. (arXiv:2306.09136v1 [cs.LG])
    We derive the first finite-time logarithmic regret bounds for Bayesian bandits. For Gaussian bandits, we obtain a $O(c_h \log^2 n)$ bound, where $c_h$ is a prior-dependent constant. This matches the asymptotic lower bound of Lai (1987). Our proofs mark a technical departure from prior works, and are simple and general. To show generality, we apply our technique to linear bandits. Our bounds shed light on the value of the prior in the Bayesian setting, both in the objective and as a side information given to the learner. They significantly improve the $\tilde{O}(\sqrt{n})$ bounds, that despite the existing lower bounds, have become standard in the literature.
    SOBER: Highly Parallel Bayesian Optimization and Bayesian Quadrature over Discrete and Mixed Spaces. (arXiv:2301.11832v3 [cs.LG] UPDATED)
    Batch Bayesian optimisation and Bayesian quadrature have been shown to be sample-efficient methods of performing optimisation and quadrature where expensive-to-evaluate objective functions can be queried in parallel. However, current methods do not scale to large batch sizes -- a frequent desideratum in practice (e.g. drug discovery or simulation-based inference). We present a novel algorithm, SOBER, which permits scalable and diversified batch global optimisation and quadrature with arbitrary acquisition functions and kernels over discrete and mixed spaces. The key to our approach is to reformulate batch selection for global optimisation as a quadrature problem, which relaxes acquisition function maximisation (non-convex) to kernel recombination (convex). Bridging global optimisation and quadrature can efficiently solve both tasks by balancing the merits of exploitative Bayesian optimisation and explorative Bayesian quadrature. We show that SOBER outperforms 11 competitive baselines on 12 synthetic and diverse real-world tasks.
    WavPool: A New Block for Deep Neural Networks. (arXiv:2306.08734v1 [cs.LG])
    Modern deep neural networks comprise many operational layers, such as dense or convolutional layers, which are often collected into blocks. In this work, we introduce a new, wavelet-transform-based network architecture that we call the multi-resolution perceptron: by adding a pooling layer, we create a new network block, the WavPool. The first step of the multi-resolution perceptron is transforming the data into its multi-resolution decomposition form by convolving the input data with filters of fixed coefficients but increasing size. Following image processing techniques, we are able to make scale and spatial information simultaneously accessible to the network without increasing the size of the data vector. WavPool outperforms a similar multilayer perceptron while using fewer parameters, and outperforms a comparable convolutional neural network by ~ 10% on relative accuracy on CIFAR-10.
    Energy Trees: Regression and Classification With Structured and Mixed-Type Covariates. (arXiv:2207.04430v2 [stat.ME] UPDATED)
    The increasing complexity of data requires methods and models that can effectively handle intricate structures, as simplifying them would result in loss of information. While several analytical tools have been developed to work with complex data objects in their original form, these tools are typically limited to single-type variables. In this work, we propose energy trees as a regression and classification model capable of accommodating structured covariates of various types. Energy trees leverage energy statistics to extend the capabilities of conditional inference trees, from which they inherit sound statistical foundations, interpretability, scale invariance, and freedom from distributional assumptions. We specifically focus on functional and graph-structured covariates, while also highlighting the model's flexibility in integrating other variable types. Extensive simulation studies demonstrate the model's competitive performance in terms of variable selection and robustness to overfitting. Finally, we assess the model's predictive ability through two empirical analyses involving human biological data. Energy trees are implemented in the R package etree.
    Learning Manifold Dimensions with Conditional Variational Autoencoders. (arXiv:2302.11756v2 [cs.LG] UPDATED)
    Although the variational autoencoder (VAE) and its conditional extension (CVAE) are capable of state-of-the-art results across multiple domains, their precise behavior is still not fully understood, particularly in the context of data (like images) that lie on or near a low-dimensional manifold. For example, while prior work has suggested that the globally optimal VAE solution can learn the correct manifold dimension, a necessary (but not sufficient) condition for producing samples from the true data distribution, this has never been rigorously proven. Moreover, it remains unclear how such considerations would change when various types of conditioning variables are introduced, or when the data support is extended to a union of manifolds (e.g., as is likely the case for MNIST digits and related). In this work, we address these points by first proving that VAE global minima are indeed capable of recovering the correct manifold dimension. We then extend this result to more general CVAEs, demonstrating practical scenarios whereby the conditioning variables allow the model to adaptively learn manifolds of varying dimension across samples. Our analyses, which have practical implications for various CVAE design choices, are also supported by numerical results on both synthetic and real-world datasets.
    Semantic HELM: An Interpretable Memory for Reinforcement Learning. (arXiv:2306.09312v1 [cs.LG])
    Reinforcement learning agents deployed in the real world often have to cope with partially observable environments. Therefore, most agents employ memory mechanisms to approximate the state of the environment. Recently, there have been impressive success stories in mastering partially observable environments, mostly in the realm of computer games like Dota 2, StarCraft II, or MineCraft. However, none of these methods are interpretable in the sense that it is not comprehensible for humans how the agent decides which actions to take based on its inputs. Yet, human understanding is necessary in order to deploy such methods in high-stake domains like autonomous driving or medical applications. We propose a novel memory mechanism that operates on human language to illuminate the decision-making process. First, we use CLIP to associate visual inputs with language tokens. Then we feed these tokens to a pretrained language model that serves the agent as memory and provides it with a coherent and interpretable representation of the past. Our memory mechanism achieves state-of-the-art performance in environments where memorizing the past is crucial to solve tasks. Further, we present situations where our memory component excels or fails to demonstrate strengths and weaknesses of our new approach.
    Bandits with Replenishable Knapsacks: the Best of both Worlds. (arXiv:2306.08470v1 [cs.LG])
    The bandits with knapsack (BwK) framework models online decision-making problems in which an agent makes a sequence of decisions subject to resource consumption constraints. The traditional model assumes that each action consumes a non-negative amount of resources and the process ends when the initial budgets are fully depleted. We study a natural generalization of the BwK framework which allows non-monotonic resource utilization, i.e., resources can be replenished by a positive amount. We propose a best-of-both-worlds primal-dual template that can handle any online learning problem with replenishment for which a suitable primal regret minimizer exists. In particular, we provide the first positive results for the case of adversarial inputs by showing that our framework guarantees a constant competitive ratio $\alpha$ when $B=\Omega(T)$ or when the possible per-round replenishment is a positive constant. Moreover, under a stochastic input model, our algorithm yields an instance-independent $\tilde{O}(T^{1/2})$ regret bound which complements existing instance-dependent bounds for the same setting. Finally, we provide applications of our framework to some economic problems of practical relevance.
    Hyperbolic Convolution via Kernel Point Aggregation. (arXiv:2306.08862v1 [cs.LG])
    Learning representations according to the underlying geometry is of vital importance for non-Euclidean data. Studies have revealed that the hyperbolic space can effectively embed hierarchical or tree-like data. In particular, the few past years have witnessed a rapid development of hyperbolic neural networks. However, it is challenging to learn good hyperbolic representations since common Euclidean neural operations, such as convolution, do not extend to the hyperbolic space. Most hyperbolic neural networks do not embrace the convolution operation and ignore local patterns. Others either only use non-hyperbolic convolution, or miss essential properties such as equivariance to permutation. We propose HKConv, a novel trainable hyperbolic convolution which first correlates trainable local hyperbolic features with fixed kernel points placed in the hyperbolic space, then aggregates the output features within a local neighborhood. HKConv not only expressively learns local features according to the hyperbolic geometry, but also enjoys equivariance to permutation of hyperbolic points and invariance to parallel transport of a local neighborhood. We show that neural networks with HKConv layers advance state-of-the-art in various tasks.
    On the Exactness of Dantzig-Wolfe Relaxation for Rank Constrained Optimization Problems. (arXiv:2210.16191v3 [math.OC] UPDATED)
    In the rank-constrained optimization problem (RCOP), it minimizes a linear objective function over a prespecified closed rank-constrained domain set and $m$ generic two-sided linear matrix inequalities. Motivated by the Dantzig-Wolfe (DW) decomposition, a popular approach of solving many nonconvex optimization problems, we investigate the strength of DW relaxation (DWR) of the RCOP, which admits the same formulation as RCOP except replacing the domain set by its closed convex hull. Notably, our goal is to characterize conditions under which the DWR matches RCOP for any m two-sided linear matrix inequalities. From the primal perspective, we develop the first-known simultaneously necessary and sufficient conditions that achieve: (i) extreme point exactness -- all the extreme points of the DWR feasible set belong to that of the RCOP; (ii) convex hull exactness -- the DWR feasible set is identical to the closed convex hull of RCOP feasible set; and (iii) objective exactness -- the optimal values of the DWR and RCOP coincide. The proposed conditions unify, refine, and extend the existing exactness results in the quadratically constrained quadratic program (QCQP) and fair unsupervised learning. These conditions can be very useful to identify new results, including the extreme point exactness for a QCQP problem that admits an inhomogeneous objective function with two homogeneous two-sided quadratic constraints and the convex hull exactness for fair SVD.
    Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning. (arXiv:2306.08590v1 [cs.LG])
    The success of SGD in deep learning has been ascribed by prior works to the implicit bias induced by high learning rate or small batch size ("SGD noise"). While prior works that focused on offline learning (i.e., multiple-epoch training), we study the impact of SGD noise on online (i.e., single epoch) learning. Through an extensive empirical analysis of image and language data, we demonstrate that large learning rate and small batch size do not confer any implicit bias advantages in online learning. In contrast to offline learning, the benefits of SGD noise in online learning are strictly computational, facilitating larger or more cost-effective gradient steps. Our work suggests that SGD in the online regime can be construed as taking noisy steps along the "golden path" of the noiseless gradient flow algorithm. We provide evidence to support this hypothesis by conducting experiments that reduce SGD noise during training and by measuring the pointwise functional distance between models trained with varying SGD noise levels, but at equivalent loss values. Our findings challenge the prevailing understanding of SGD and offer novel insights into its role in online learning.
    Entropic Issues in Likelihood-Based OOD Detection. (arXiv:2109.10794v2 [stat.ML] UPDATED)
    Deep generative models trained by maximum likelihood remain very popular methods for reasoning about data probabilistically. However, it has been observed that they can assign higher likelihoods to out-of-distribution (OOD) data than in-distribution data, thus calling into question the meaning of these likelihood values. In this work we provide a novel perspective on this phenomenon, decomposing the average likelihood into a KL divergence term and an entropy term. We argue that the latter can explain the curious OOD behaviour mentioned above, suppressing likelihood values on datasets with higher entropy. Although our idea is simple, we have not seen it explored yet in the literature. This analysis provides further explanation for the success of OOD detection methods based on likelihood ratios, as the problematic entropy term cancels out in expectation. Finally, we discuss how this observation relates to recent success in OOD detection with manifold-supported models, for which the above decomposition does not hold directly.
    Bayesian Fixed-Budget Best-Arm Identification. (arXiv:2211.08572v3 [cs.LG] UPDATED)
    Fixed-budget best-arm identification (BAI) is a bandit problem where the agent maximizes the probability of identifying the optimal arm within a fixed budget of observations. In this work, we study this problem in the Bayesian setting. We propose a Bayesian elimination algorithm and derive an upper bound on its probability of misidentifying the optimal arm. The bound reflects the quality of the prior and is the first distribution-dependent bound in this setting. We prove it using a frequentist-like argument, where we carry the prior through, and then integrate out the bandit instance at the end. We also provide a lower bound on the probability of misidentification in a $2$-armed Bayesian bandit and show that our upper bound (almost) matches it for any budget. Our experiments show that Bayesian elimination is superior to frequentist methods and competitive with the state-of-the-art Bayesian algorithms that have no guarantees in our setting.
    Anticipatory Music Transformer. (arXiv:2306.08620v1 [cs.SD])
    We introduce anticipation: a method for constructing a controllable generative model of a temporal point process (the event process) conditioned asynchronously on realizations of a second, correlated process (the control process). We achieve this by interleaving sequences of events and controls, such that controls appear following stopping times in the event sequence. This work is motivated by problems arising in the control of symbolic music generation. We focus on infilling control tasks, whereby the controls are a subset of the events themselves, and conditional generation completes a sequence of events given the fixed control events. We train anticipatory infilling models using the large and diverse Lakh MIDI music dataset. These models match the performance of autoregressive models for prompted music generation, with the additional capability to perform infilling control tasks, including accompaniment. Human evaluators report that an anticipatory model produces accompaniments with similar musicality to even music composed by humans over a 20-second clip.
    Understanding Optimization of Deep Learning. (arXiv:2306.09338v1 [cs.LG])
    This article provides a comprehensive understanding of optimization in deep learning, with a primary focus on the challenges of gradient vanishing and gradient exploding, which normally lead to diminished model representational ability and training instability, respectively. We analyze these two challenges through several strategic measures, including the improvement of gradient flow and the imposition of constraints on a network's Lipschitz constant. To help understand the current optimization methodologies, we categorize them into two classes: explicit optimization and implicit optimization. Explicit optimization methods involve direct manipulation of optimizer parameters, including weight, gradient, learning rate, and weight decay. Implicit optimization methods, by contrast, focus on improving the overall landscape of a network by enhancing its modules, such as residual shortcuts, normalization methods, attention mechanisms, and activations. In this article, we provide an in-depth analysis of these two optimization classes and undertake a thorough examination of the Jacobian matrices and the Lipschitz constants of many widely used deep learning modules, highlighting existing issues as well as potential improvements. Moreover, we also conduct a series of analytical experiments to substantiate our theoretical discussions. This article does not aim to propose a new optimizer or network. Rather, our intention is to present a comprehensive understanding of optimization in deep learning. We hope that this article will assist readers in gaining a deeper insight in this field and encourages the development of more robust, efficient, and high-performing models.
    A Gromov--Wasserstein Geometric View of Spectrum-Preserving Graph Coarsening. (arXiv:2306.08854v1 [cs.LG])
    Graph coarsening is a technique for solving large-scale graph problems by working on a smaller version of the original graph, and possibly interpolating the results back to the original graph. It has a long history in scientific computing and has recently gained popularity in machine learning, particularly in methods that preserve the graph spectrum. This work studies graph coarsening from a different perspective, developing a theory for preserving graph distances and proposing a method to achieve this. The geometric approach is useful when working with a collection of graphs, such as in graph classification and regression. In this study, we consider a graph as an element on a metric space equipped with the Gromov--Wasserstein (GW) distance, and bound the difference between the distance of two graphs and their coarsened versions. Minimizing this difference can be done using the popular weighted kernel $K$-means method, which improves existing spectrum-preserving methods with the proper choice of the kernel. The study includes a set of experiments to support the theory and method, including approximating the GW distance, preserving the graph spectrum, classifying graphs using spectral information, and performing regression using graph convolutional networks. Code is available at https://github.com/ychen-stat-ml/GW-Graph-Coarsening .
    MPSA-DenseNet: A novel deep learning model for English accent classification. (arXiv:2306.08798v1 [cs.CL])
    This paper presents three innovative deep learning models for English accent classification: Multi-DenseNet, PSA-DenseNet, and MPSE-DenseNet, that combine multi-task learning and the PSA module attention mechanism with DenseNet. We applied these models to data collected from six dialects of English across native English speaking regions (Britain, the United States, Scotland) and nonnative English speaking regions (China, Germany, India). Our experimental results show a significant improvement in classification accuracy, particularly with MPSA-DenseNet, which outperforms all other models, including DenseNet and EPSA models previously used for accent identification. Our findings indicate that MPSA-DenseNet is a highly promising model for accurately identifying English accents.
    Tight Certification of Adversarially Trained Neural Networks via Nonconvex Low-Rank Semidefinite Relaxations. (arXiv:2211.17244v3 [cs.LG] UPDATED)
    Adversarial training is well-known to produce high-quality neural network models that are empirically robust against adversarial perturbations. Nevertheless, once a model has been adversarially trained, one often desires a certification that the model is truly robust against all future attacks. Unfortunately, when faced with adversarially trained models, all existing approaches have significant trouble making certifications that are strong enough to be practically useful. Linear programming (LP) techniques in particular face a "convex relaxation barrier" that prevent them from making high-quality certifications, even after refinement with mixed-integer linear programming (MILP) and branch-and-bound (BnB) techniques. In this paper, we propose a nonconvex certification technique, based on a low-rank restriction of a semidefinite programming (SDP) relaxation. The nonconvex relaxation makes strong certifications comparable to much more expensive SDP methods, while optimizing over dramatically fewer variables comparable to much weaker LP methods. Despite nonconvexity, we show how off-the-shelf local optimization algorithms can be used to achieve and to certify global optimality in polynomial time. Our experiments find that the nonconvex relaxation almost completely closes the gap towards exact certification of adversarially trained models.
    MMD-FUSE: Learning and Combining Kernels for Two-Sample Testing Without Data Splitting. (arXiv:2306.08777v1 [stat.ML])
    We propose novel statistics which maximise the power of a two-sample test based on the Maximum Mean Discrepancy (MMD), by adapting over the set of kernels used in defining it. For finite sets, this reduces to combining (normalised) MMD values under each of these kernels via a weighted soft maximum. Exponential concentration bounds are proved for our proposed statistics under the null and alternative. We further show how these kernels can be chosen in a data-dependent but permutation-independent way, in a well-calibrated test, avoiding data splitting. This technique applies more broadly to general permutation-based MMD testing, and includes the use of deep kernels with features learnt using unsupervised models such as auto-encoders. We highlight the applicability of our MMD-FUSE test on both synthetic low-dimensional and real-world high-dimensional data, and compare its performance in terms of power against current state-of-the-art kernel tests.
    Exact Count of Boundary Pieces of ReLU Classifiers: Towards the Proper Complexity Measure for Classification. (arXiv:2306.08805v1 [stat.ML])
    Classic learning theory suggests that proper regularization is the key to good generalization and robustness. In classification, current training schemes only target the complexity of the classifier itself, which can be misleading and ineffective. Instead, we advocate directly measuring the complexity of the decision boundary. Existing literature is limited in this area with few well-established definitions of boundary complexity. As a proof of concept, we start by analyzing ReLU neural networks, whose boundary complexity can be conveniently characterized by the number of affine pieces. With the help of tropical geometry, we develop a novel method that can explicitly count the exact number of boundary pieces, and as a by-product, the exact number of total affine pieces. Numerical experiments are conducted and distinctive properties of our boundary complexity are uncovered. First, the boundary piece count appears largely independent of other measures, e.g., total piece count, and $l_2$ norm of weights, during the training process. Second, the boundary piece count is negatively correlated with robustness, where popular robust training techniques, e.g., adversarial training or random noise injection, are found to reduce the number of boundary pieces.
    Diffusion-based Conditional ECG Generation with Structured State Space Models. (arXiv:2301.08227v2 [eess.SP] UPDATED)
    Synthetic data generation is a promising solution to address privacy issues with the distribution of sensitive health data. Recently, diffusion models have set new standards for generative models for different data modalities. Also very recently, structured state space models emerged as a powerful modeling paradigm to capture long-term dependencies in time series. We put forward SSSD-ECG, as the combination of these two technologies, for the generation of synthetic 12-lead electrocardiograms conditioned on more than 70 ECG statements. Due to a lack of reliable baselines, we also propose conditional variants of two state-of-the-art unconditional generative models. We thoroughly evaluate the quality of the generated samples, by evaluating pretrained classifiers on the generated data and by evaluating the performance of a classifier trained only on synthetic data, where SSSD-ECG clearly outperforms its GAN-based competitors. We demonstrate the soundness of our approach through further experiments, including conditional class interpolation and a clinical Turing test demonstrating the high quality of the SSSD-ECG samples across a wide range of conditions.
    Shuffle SGD is Always Better than SGD: Improved Analysis of SGD with Arbitrary Data Orders. (arXiv:2305.19259v2 [cs.LG] UPDATED)
    Stochastic Gradient Descent (SGD) algorithms are widely used in optimizing neural networks, with Random Reshuffling (RR) and Single Shuffle (SS) being popular choices for cycling through random or single permutations of the training data. However, the convergence properties of these algorithms in the non-convex case are not fully understood. Existing results suggest that, in realistic training scenarios where the number of epochs is smaller than the training set size, RR may perform worse than SGD. In this paper, we analyze a general SGD algorithm that allows for arbitrary data orderings and show improved convergence rates for non-convex functions. Specifically, our analysis reveals that SGD with random and single shuffling is always faster or at least as good as classical SGD with replacement, regardless of the number of iterations. Overall, our study highlights the benefits of using SGD with random/single shuffling and provides new insights into its convergence properties for non-convex optimization.
    Class-Conditional Conformal Prediction With Many Classes. (arXiv:2306.09335v1 [stat.ML])
    Standard conformal prediction methods provide a marginal coverage guarantee, which means that for a random test point, the conformal prediction set contains the true label with a user-chosen probability. In many classification problems, we would like to obtain a stronger guarantee -- that for test points of a specific class, the prediction set contains the true label with the same user-chosen probability. Existing conformal prediction methods do not work well when there is a limited amount of labeled data per class, as is often the case in real applications where the number of classes is large. We propose a method called clustered conformal prediction, which clusters together classes that have "similar" conformal scores and then performs conformal prediction at the cluster level. Based on empirical evaluation across four image data sets with many (up to 1000) classes, we find that clustered conformal typically outperforms existing methods in terms of class-conditional coverage and set size metrics.
    Noise Stability Optimization for Flat Minima with Optimal Convergence Rates. (arXiv:2306.08553v1 [cs.LG])
    We consider finding flat, local minimizers by adding average weight perturbations. Given a nonconvex function $f: \mathbb{R}^d \rightarrow \mathbb{R}$ and a $d$-dimensional distribution $\mathcal{P}$ which is symmetric at zero, we perturb the weight of $f$ and define $F(W) = \mathbb{E}[f({W + U})]$, where $U$ is a random sample from $\mathcal{P}$. This injection induces regularization through the Hessian trace of $f$ for small, isotropic Gaussian perturbations. Thus, the weight-perturbed function biases to minimizers with low Hessian trace. Several prior works have studied settings related to this weight-perturbed function by designing algorithms to improve generalization. Still, convergence rates are not known for finding minima under the average perturbations of the function $F$. This paper considers an SGD-like algorithm that injects random noise before computing gradients while leveraging the symmetry of $\mathcal{P}$ to reduce variance. We then provide a rigorous analysis, showing matching upper and lower bounds of our algorithm for finding an approximate first-order stationary point of $F$ when the gradient of $f$ is Lipschitz-continuous. We empirically validate our algorithm for several image classification tasks with various architectures. Compared to sharpness-aware minimization, we note a 12.6% and 7.8% drop in the Hessian trace and top eigenvalue of the found minima, respectively, averaged over eight datasets. Ablation studies validate the benefit of the design of our algorithm.
    Extrapolated cross-validation for randomized ensembles. (arXiv:2302.13511v2 [stat.ME] UPDATED)
    Ensemble methods such as bagging and random forests are ubiquitous in various fields, from finance to genomics. Despite their prevalence, the question of the efficient tuning of ensemble parameters has received relatively little attention. This paper introduces a cross-validation method, ECV (Extrapolated Cross-Validation), for tuning the ensemble and subsample sizes in randomized ensembles. Our method builds on two primary ingredients: initial estimators for small ensemble sizes using out-of-bag errors and a novel risk extrapolation technique that leverages the structure of prediction risk decomposition. By establishing uniform consistency of our risk extrapolation technique over ensemble and subsample sizes, we show that ECV yields $\delta$-optimal (with respect to the oracle-tuned risk) ensembles for squared prediction risk. Our theory accommodates general ensemble predictors, only requires mild moment assumptions, and allows for high-dimensional regimes where the feature dimension grows with the sample size. As a practical case study, we employ ECV to predict surface protein abundances from gene expressions in single-cell multiomics using random forests. In comparison to sample-split cross-validation and $K$-fold cross-validation, ECV achieves higher accuracy avoiding sample splitting. At the same time, its computational cost is considerably lower owing to the use of the risk extrapolation technique. Additional numerical results validate the finite-sample accuracy of ECV for several common ensemble predictors under a computational constraint on the maximum ensemble size.
    A Heavy-Tailed Algebra for Probabilistic Programming. (arXiv:2306.09262v1 [stat.ML])
    Despite the successes of probabilistic models based on passing noise through neural networks, recent work has identified that such methods often fail to capture tail behavior accurately, unless the tails of the base distribution are appropriately calibrated. To overcome this deficiency, we propose a systematic approach for analyzing the tails of random variables, and we illustrate how this approach can be used during the static analysis (before drawing samples) pass of a probabilistic programming language compiler. To characterize how the tails change under various operations, we develop an algebra which acts on a three-parameter family of tail asymptotics and which is based on the generalized Gamma distribution. Our algebraic operations are closed under addition and multiplication; they are capable of distinguishing sub-Gaussians with differing scales; and they handle ratios sufficiently well to reproduce the tails of most important statistical distributions directly from their definitions. Our empirical results confirm that inference algorithms that leverage our heavy-tailed algebra attain superior performance across a number of density modeling and variational inference tasks.
    On Certified Generalization in Structured Prediction. (arXiv:2306.09112v1 [stat.ML])
    In structured prediction, target objects have rich internal structure which does not factorize into independent components and violates common i.i.d. assumptions. This challenge becomes apparent through the exponentially large output space in applications such as image segmentation or scene graph generation. We present a novel PAC-Bayesian risk bound for structured prediction wherein the rate of generalization scales not only with the number of structured examples but also with their size. The underlying assumption, conforming to ongoing research on generative models, is that data are generated by the Knothe-Rosenblatt rearrangement of a factorizing reference measure. This allows to explicitly distill the structure between random output variables into a Wasserstein dependency matrix. Our work makes a preliminary step towards leveraging powerful generative models to establish generalization bounds for discriminative downstream tasks in the challenging setting of structured prediction.
    Batches Stabilize the Minimum Norm Risk in High Dimensional Overparameterized Linear Regression. (arXiv:2306.08432v1 [cs.LG])
    Learning algorithms that divide the data into batches are prevalent in many machine-learning applications, typically offering useful trade-offs between computational efficiency and performance. In this paper, we examine the benefits of batch-partitioning through the lens of a minimum-norm overparameterized linear regression model with isotropic Gaussian features. We suggest a natural small-batch version of the minimum-norm estimator, and derive an upper bound on its quadratic risk, showing it is inversely proportional to the noise level as well as to the overparameterization ratio, for the optimal choice of batch size. In contrast to minimum-norm, our estimator admits a stable risk behavior that is monotonically increasing in the overparameterization ratio, eliminating both the blowup at the interpolation point and the double-descent phenomenon. Interestingly, we observe that this implicit regularization offered by the batch partition is partially explained by feature overlap between the batches. Our bound is derived via a novel combination of techniques, in particular normal approximation in the Wasserstein metric of noisy projections over random subspaces.
    Predicting Real-time Crash Risks during Hurricane Evacuation Using Connected Vehicle Data. (arXiv:2306.08682v1 [stat.ML])
    Hurricane evacuation, ordered to save lives of people of coastal regions, generates high traffic demand with increased crash risk. To mitigate such risk, transportation agencies need to anticipate highway locations with high crash risks to deploy appropriate countermeasures. With ubiquitous sensors and communication technologies, it is now possible to retrieve micro-level vehicular data containing individual vehicle trajectory and speed information. Such high-resolution vehicle data, potentially available in real time, can be used to assess prevailing traffic safety conditions. Using vehicle speed and acceleration profiles, potential crash risks can be predicted in real time. Previous studies on real-time crash risk prediction mainly used data from infrastructure-based sensors which may not cover many road segments. In this paper, we present methods to determine potential crash risks during hurricane evacuation from an emerging alternative data source known as connected vehicle data. Such data contain vehicle location, speed, and acceleration information collected at a very high frequency (less than 30 seconds). To predict potential crash risks, we utilized a dataset collected during the evacuation period of Hurricane Ida on Interstate-10 (I-10) in the state of Louisiana. Multiple machine learning models were trained considering weather features and different traffic characteristics extracted from the connected vehicle data in 5-minute intervals. The results indicate that the Gaussian Process Boosting (GPBoost) and Extreme Gradient Boosting (XGBoost) models perform better (recall = 0.91) than other models. The real-time connected vehicle data for crash risks assessment will allow traffic managers to efficiently utilize resources to proactively take safety measures.
    Integrating Uncertainty Awareness into Conformalized Quantile Regression. (arXiv:2306.08693v1 [stat.ME])
    Conformalized Quantile Regression (CQR) is a recently proposed method for constructing prediction intervals for a response $Y$ given covariates $X$, without making distributional assumptions. However, as we demonstrate empirically, existing constructions of CQR can be ineffective for problems where the quantile regressors perform better in certain parts of the feature space than others. The reason is that the prediction intervals of CQR do not distinguish between two forms of uncertainty: first, the variability of the conditional distribution of $Y$ given $X$ (i.e., aleatoric uncertainty), and second, our uncertainty in estimating this conditional distribution (i.e., epistemic uncertainty). This can lead to uneven coverage, with intervals that are overly wide (or overly narrow) in regions where epistemic uncertainty is low (or high). To address this, we propose a new variant of the CQR methodology, Uncertainty-Aware CQR (UACQR), that explicitly separates these two sources of uncertainty to adjust quantile regressors differentially across the feature space. Compared to CQR, our methods enjoy the same distribution-free theoretical guarantees for coverage properties, while demonstrating in our experiments stronger conditional coverage in simulated settings and tighter intervals on a range of real-world data sets.
    Private Federated Learning Without a Trusted Server: Optimal Algorithms for Convex Losses. (arXiv:2106.09779v8 [cs.LG] UPDATED)
    This paper studies federated learning (FL)--especially cross-silo FL--with data from people who do not trust the server or other silos. In this setting, each silo (e.g. hospital) has data from different people (e.g. patients) and must maintain the privacy of each person's data (e.g. medical record), even if the server or other silos act as adversarial eavesdroppers. This requirement motivates the study of Inter-Silo Record-Level Differential Privacy (ISRL-DP), which requires silo i's communications to satisfy record/item-level differential privacy (DP). ISRL-DP ensures that the data of each person (e.g. patient) in silo i (e.g. hospital i) cannot be leaked. ISRL-DP is different from well-studied privacy notions. Central and user-level DP assume that people trust the server/other silos. On the other end of the spectrum, local DP assumes that people do not trust anyone at all (even their own silo). Sitting between central and local DP, ISRL-DP makes the realistic assumption (in cross-silo FL) that people trust their own silo, but not the server or other silos. In this work, we provide tight (up to logarithms) upper and lower bounds for ISRL-DP FL with convex/strongly convex loss functions and homogeneous (i.i.d.) silo data. Remarkably, we show that similar bounds are attainable for smooth losses with arbitrary heterogeneous silo data distributions, via an accelerated ISRL-DP algorithm. We also provide tight upper and lower bounds for ISRL-DP federated empirical risk minimization, and use acceleration to attain the optimal bounds in fewer rounds of communication than the state-of-the-art. Finally, with a secure "shuffler" to anonymize silo messages (but without a trusted server), our algorithm attains the optimal central DP rates under more practical trust assumptions. Numerical experiments show favorable privacy-accuracy tradeoffs for our algorithm in classification and regression tasks.
    Multi-class Graph Clustering via Approximated Effective $p$-Resistance. (arXiv:2306.08617v1 [cs.LG])
    This paper develops an approximation to the (effective) $p$-resistance and applies it to multi-class clustering. Spectral methods based on the graph Laplacian and its generalization to the graph $p$-Laplacian have been a backbone of non-euclidean clustering techniques. The advantage of the $p$-Laplacian is that the parameter $p$ induces a controllable bias on cluster structure. The drawback of $p$-Laplacian eigenvector based methods is that the third and higher eigenvectors are difficult to compute. Thus, instead, we are motivated to use the $p$-resistance induced by the $p$-Laplacian for clustering. For $p$-resistance, small $p$ biases towards clusters with high internal connectivity while large $p$ biases towards clusters of small ``extent,'' that is a preference for smaller shortest-path distances between vertices in the cluster. However, the $p$-resistance is expensive to compute. We overcome this by developing an approximation to the $p$-resistance. We prove upper and lower bounds on this approximation and observe that it is exact when the graph is a tree. We also provide theoretical justification for the use of $p$-resistance for clustering. Finally, we provide experiments comparing our approximated $p$-resistance clustering to other $p$-Laplacian based methods.
    Fit Like You Sample: Sample-Efficient Generalized Score Matching from Fast Mixing Markov Chains. (arXiv:2306.09332v1 [cs.DS])
    Score matching is an approach to learning probability distributions parametrized up to a constant of proportionality (e.g. Energy-Based Models). The idea is to fit the score of the distribution, rather than the likelihood, thus avoiding the need to evaluate the constant of proportionality. While there's a clear algorithmic benefit, the statistical "cost'' can be steep: recent work by Koehler et al. 2022 showed that for distributions that have poor isoperimetric properties (a large Poincar\'e or log-Sobolev constant), score matching is substantially statistically less efficient than maximum likelihood. However, many natural realistic distributions, e.g. multimodal distributions as simple as a mixture of two Gaussians in one dimension -- have a poor Poincar\'e constant. In this paper, we show a close connection between the mixing time of an arbitrary Markov process with generator $\mathcal{L}$ and a generalized score matching loss that tries to fit $\frac{\mathcal{O} p}{p}$. If $\mathcal{L}$ corresponds to a Markov process corresponding to a continuous version of simulated tempering, we show the corresponding generalized score matching loss is a Gaussian-convolution annealed score matching loss, akin to the one proposed in Song and Ermon 2019. Moreover, we show that if the distribution being learned is a finite mixture of Gaussians in $d$ dimensions with a shared covariance, the sample complexity of annealed score matching is polynomial in the ambient dimension, the diameter the means, and the smallest and largest eigenvalues of the covariance -- obviating the Poincar\'e constant-based lower bounds of the basic score matching loss shown in Koehler et al. 2022. This is the first result characterizing the benefits of annealing for score matching -- a crucial component in more sophisticated score-based approaches like Song and Ermon 2019.
    Safeguarding Data in Multimodal AI: A Differentially Private Approach to CLIP Training. (arXiv:2306.08173v1 [cs.LG])
    The surge in multimodal AI's success has sparked concerns over data privacy in vision-and-language tasks. While CLIP has revolutionized multimodal learning through joint training on images and text, its potential to unintentionally disclose sensitive information necessitates the integration of privacy-preserving mechanisms. We introduce a differentially private adaptation of the Contrastive Language-Image Pretraining (CLIP) model that effectively addresses privacy concerns while retaining accuracy. Our proposed method, Dp-CLIP, is rigorously evaluated on benchmark datasets encompassing diverse vision-and-language tasks such as image classification and visual question answering. We demonstrate that our approach retains performance on par with the standard non-private CLIP model. Furthermore, we analyze our proposed algorithm under linear representation settings. We derive the convergence rate of our algorithm and show a trade-off between utility and privacy when gradients are clipped per-batch and the loss function does not satisfy smoothness conditions assumed in the literature for the analysis of DP-SGD.  ( 2 min )
    Provably Efficient Offline Reinforcement Learning with Perturbed Data Sources. (arXiv:2306.08364v1 [stat.ML])
    Existing theoretical studies on offline reinforcement learning (RL) mostly consider a dataset sampled directly from the target task. In practice, however, data often come from several heterogeneous but related sources. Motivated by this gap, this work aims at rigorously understanding offline RL with multiple datasets that are collected from randomly perturbed versions of the target task instead of from itself. An information-theoretic lower bound is derived, which reveals a necessary requirement on the number of involved sources in addition to that on the number of data samples. Then, a novel HetPEVI algorithm is proposed, which simultaneously considers the sample uncertainties from a finite number of data samples per data source and the source uncertainties due to a finite number of available data sources. Theoretical analyses demonstrate that HetPEVI can solve the target task as long as the data sources collectively provide a good data coverage. Moreover, HetPEVI is demonstrated to be optimal up to a polynomial factor of the horizon length. Finally, the study is extended to offline Markov games and offline robust RL, which demonstrates the generality of the proposed designs and theoretical analyses.  ( 2 min )
    Implicit Compressibility of Overparametrized Neural Networks Trained with Heavy-Tailed SGD. (arXiv:2306.08125v1 [stat.ML])
    Neural network compression has been an increasingly important subject, due to its practical implications in terms of reducing the computational requirements and its theoretical implications, as there is an explicit connection between compressibility and the generalization error. Recent studies have shown that the choice of the hyperparameters of stochastic gradient descent (SGD) can have an effect on the compressibility of the learned parameter vector. Even though these results have shed some light on the role of the training dynamics over compressibility, they relied on unverifiable assumptions and the resulting theory does not provide a practical guideline due to its implicitness. In this study, we propose a simple modification for SGD, such that the outputs of the algorithm will be provably compressible without making any nontrivial assumptions. We consider a one-hidden-layer neural network trained with SGD and we inject additive heavy-tailed noise to the iterates at each iteration. We then show that, for any compression rate, there exists a level of overparametrization (i.e., the number of hidden units), such that the output of the algorithm will be compressible with high probability. To achieve this result, we make two main technical contributions: (i) we build on a recent study on stochastic analysis and prove a 'propagation of chaos' result with improved rates for a class of heavy-tailed stochastic differential equations, and (ii) we derive strong-error estimates for their Euler discretization. We finally illustrate our approach on experiments, where the results suggest that the proposed approach achieves compressibility with a slight compromise from the training and test error.  ( 3 min )
    Nonparametric regression using over-parameterized shallow ReLU neural networks. (arXiv:2306.08321v1 [stat.ML])
    It is shown that over-parameterized neural networks can achieve minimax optimal rates of convergence (up to logarithmic factors) for learning functions from certain smooth function classes, if the weights are suitably constrained or regularized. Specifically, we consider the nonparametric regression of estimating an unknown $d$-variate function by using shallow ReLU neural networks. It is assumed that the regression function is from the H\"older space with smoothness $\alpha<(d+3)/2$ or a variation space corresponding to shallow neural networks, which can be viewed as an infinitely wide neural network. In this setting, we prove that least squares estimators based on shallow neural networks with certain norm constraints on the weights are minimax optimal, if the network width is sufficiently large. As a byproduct, we derive a new size-independent bound for the local Rademacher complexity of shallow ReLU neural networks, which may be of independent interest.  ( 2 min )
    Differentially Private Wireless Federated Learning Using Orthogonal Sequences. (arXiv:2306.08280v1 [cs.IT])
    We propose a novel privacy-preserving uplink over-the-air computation (AirComp) method, termed FLORAS, for single-input single-output (SISO) wireless federated learning (FL) systems. From the communication design perspective, FLORAS eliminates the requirement of channel state information at the transmitters (CSIT) by leveraging the properties of orthogonal sequences. From the privacy perspective, we prove that FLORAS can offer both item-level and client-level differential privacy (DP) guarantees. Moreover, by adjusting the system parameters, FLORAS can flexibly achieve different DP levels at no additional cost. A novel FL convergence bound is derived which, combined with the privacy guarantees, allows for a smooth tradeoff between convergence rate and differential privacy levels. Numerical results demonstrate the advantages of FLORAS compared with the baseline AirComp method, and validate that our analytical results can guide the design of privacy-preserving FL with different tradeoff requirements on the model convergence and privacy levels.  ( 2 min )
    Unbiased Learning of Deep Generative Models with Structured Discrete Representations. (arXiv:2306.08230v1 [cs.LG])
    By composing graphical models with deep learning architectures, we learn generative models with the strengths of both frameworks. The structured variational autoencoder (SVAE) inherits structure and interpretability from graphical models, and flexible likelihoods for high-dimensional data from deep learning, but poses substantial optimization challenges. We propose novel algorithms for learning SVAEs, and are the first to demonstrate the SVAE's ability to handle multimodal uncertainty when data is missing by incorporating discrete latent variables. Our memory-efficient implicit differentiation scheme makes the SVAE tractable to learn via gradient descent, while demonstrating robustness to incomplete optimization. To more rapidly learn accurate graphical model parameters, we derive a method for computing natural gradients without manual derivations, which avoids biases found in prior work. These optimization innovations enable the first comparisons of the SVAE to state-of-the-art time series models, where the SVAE performs competitively while learning interpretable and structured discrete data representations.  ( 2 min )
    Bayesian Non-linear Latent Variable Modeling via Random Fourier Features. (arXiv:2306.08352v1 [stat.ML])
    The Gaussian process latent variable model (GPLVM) is a popular probabilistic method used for nonlinear dimension reduction, matrix factorization, and state-space modeling. Inference for GPLVMs is computationally tractable only when the data likelihood is Gaussian. Moreover, inference for GPLVMs has typically been restricted to obtaining maximum a posteriori point estimates, which can lead to overfitting, or variational approximations, which mischaracterize the posterior uncertainty. Here, we present a method to perform Markov chain Monte Carlo (MCMC) inference for generalized Bayesian nonlinear latent variable modeling. The crucial insight necessary to generalize GPLVMs to arbitrary observation models is that we approximate the kernel function in the Gaussian process mappings with random Fourier features; this allows us to compute the gradient of the posterior in closed form with respect to the latent variables. We show that we can generalize GPLVMs to non-Gaussian observations, such as Poisson, negative binomial, and multinomial distributions, using our random feature latent variable model (RFLVM). Our generalized RFLVMs perform on par with state-of-the-art latent variable models on a wide range of applications, including motion capture, images, and text data for the purpose of estimating the latent structure and imputing the missing data of these complex data sets.  ( 2 min )
    Nearly Optimal Algorithms with Sublinear Computational Complexity for Online Kernel Regression. (arXiv:2306.08320v1 [cs.LG])
    The trade-off between regret and computational cost is a fundamental problem for online kernel regression, and previous algorithms worked on the trade-off can not keep optimal regret bounds at a sublinear computational complexity. In this paper, we propose two new algorithms, AOGD-ALD and NONS-ALD, which can keep nearly optimal regret bounds at a sublinear computational complexity, and give sufficient conditions under which our algorithms work. Both algorithms dynamically maintain a group of nearly orthogonal basis used to approximate the kernel mapping, and keep nearly optimal regret bounds by controlling the approximate error. The number of basis depends on the approximate error and the decay rate of eigenvalues of the kernel matrix. If the eigenvalues decay exponentially, then AOGD-ALD and NONS-ALD separately achieves a regret of $O(\sqrt{L(f)})$ and $O(\mathrm{d}_{\mathrm{eff}}(\mu)\ln{T})$ at a computational complexity in $O(\ln^2{T})$. If the eigenvalues decay polynomially with degree $p\geq 1$, then our algorithms keep the same regret bounds at a computational complexity in $o(T)$ in the case of $p>4$ and $p\geq 10$, respectively. $L(f)$ is the cumulative losses of $f$ and $\mathrm{d}_{\mathrm{eff}}(\mu)$ is the effective dimension of the problem. The two regret bounds are nearly optimal and are not comparable.  ( 2 min )

  • Open

    How to install Bark with voice cloning locally?
    I want to install bark with voice cloning locally and I do not find any installation help with this. Can someone please provide step by step instructions for this? I tried the collab but that is very limited, and I can't get that to work properly, and it'd be way easier if I just set it up locally. I do not know of any scripts for local installs either so if someone could point me to a python script to use a custom voice with bark and run the TTS that'd be great too. submitted by /u/SimRacer101 [link] [comments]  ( 8 min )
    Philippines: Tech Savvy Kids use AI to catch a thief caught on video
    Videos and pics taken from a facebook post. Yesterday, in my hometown of Bacolod City, Philippines, a man stole a bag from two performing kids on stage. The theft was caught on video by one of the venue's security cameras. Commentators on social media used artificial intelligence to sharpen the image of the thief, and they sent it to the kids, who then gave it to the police. The authorities were able to recover the bag, but one cellphone was missing. The suspect is identified but still at large. ​ https://i.redd.it/v80i1p3dln6b1.gif Link to original post and video https://www.facebook.com/lexchavezofficial/videos/1307441943456719/ ​ AI sharpened image ​ Ai sharpened image The update - translated as best I could and summarized. Thank you to all who helped and shared. We tried to secure our belongings but still bad things happened. Our bag was stolen with 2 cellphones, wallet, cameras and microphone. The police helped us recover the bag but one cellphone was missing and the money missing from the wallet. https://www.facebook.com/lexchavezofficial/posts/pfbid02hQGmL8Q36ZVHPXXNkw3W26RFTuzeaGCNzLTBA5nQpqx9r1q9uS1itWeBUGFHYt2Fl submitted by /u/mrlechon [link] [comments]  ( 8 min )
    Would everyone use AI differently if your phone/computer was the power used to complete the task asked of it..?
    As above, so below... submitted by /u/Slugity [link] [comments]  ( 8 min )
    How to setup GPT Engineer to run from command line (quick tutorial)
    submitted by /u/mutantdustbunny [link] [comments]  ( 8 min )
    With voice cloning, is it possible to make the voice sound like a natural speaker in a different language?
    I'm looking into transcripting audio and having the speaker be voice cloned so i can have it speak in a different language. My question is does it speak unnatural, and what are the best programs available to use free or otherwise? submitted by /u/MrSneaky64 [link] [comments]  ( 8 min )
    I made a sci-fi short film about the dangers of AI using AI generated video
    submitted by /u/bopili [link] [comments]  ( 8 min )
    Day 8: I did some research and experimented with different prompts on Bing Image Creator
    ​ https://preview.redd.it/kx64m56ntm6b1.jpg?width=1024&format=pjpg&auto=webp&s=0282bd3062215ba220009ee76ccfcc7f0f8b9d04 https://preview.redd.it/yt8f126ntm6b1.jpg?width=1024&format=pjpg&auto=webp&s=22c20a4d7f6574fb8bb6ee0146eb3f254367d16c https://preview.redd.it/4groo16ntm6b1.jpg?width=1024&format=pjpg&auto=webp&s=46087087557e8e2ed7148e13c6dc12a06a10dcfb https://preview.redd.it/okm8jn6ntm6b1.jpg?width=1024&format=pjpg&auto=webp&s=bd36ba5923e937aa2a209d1af825fae00e9548ae https://preview.redd.it/5pepmq6ntm6b1.jpg?width=1024&format=pjpg&auto=webp&s=3b20d02e52bbdb988309184fa40d65bc075b7d4e https://preview.redd.it/arj38r6ntm6b1.jpg?width=1024&format=pjpg&auto=webp&s=db6ef0bd49536761c6d384e9b63084d772beaaa4 https://preview.redd.it/1127376ntm6b1.jpg?width=1024&format=pjpg&auto=webp&s=735898d6eeb65726129cfed370d9ab0782e1fac4 submitted by /u/Blaze_furyX [link] [comments]  ( 8 min )
    Day 8: I did some research and experimented with different prompts on Bing Image Creator
    ​ https://preview.redd.it/yhi8x4egtm6b1.jpg?width=1024&format=pjpg&auto=webp&s=4aa067fa7681fe50a2a877dfef1bf5fa30d88e32 https://preview.redd.it/3316wjdgtm6b1.jpg?width=1024&format=pjpg&auto=webp&s=990b9973bdd7777def3f044e5ed933040ce6320b https://preview.redd.it/0o7mv5egtm6b1.jpg?width=1024&format=pjpg&auto=webp&s=96110417a2c4b42cfae2b744957d64be40ebc7b8 https://preview.redd.it/p8t0d6egtm6b1.jpg?width=1024&format=pjpg&auto=webp&s=22e1914b32329a020ab389b7c20fa55597c9bbe1 https://preview.redd.it/es1uu6egtm6b1.jpg?width=1024&format=pjpg&auto=webp&s=56bf96dc3109a8c7c7b0ea477bb230dcaac966ac https://preview.redd.it/ldo3j8egtm6b1.jpg?width=1024&format=pjpg&auto=webp&s=6521986816015c3dba265f7f6a08442620143b8a https://preview.redd.it/9kk6vodgtm6b1.jpg?width=1024&format=pjpg&auto=webp&s=d432b048c874d4fe2110efb1656678888742b958 https://preview.redd.it/3mazuaegtm6b1.jpg?width=1024&format=pjpg&auto=webp&s=9e8f54868ce7830532c189b41d92169ff5ec7ee5 submitted by /u/Blaze_furyX [link] [comments]  ( 8 min )
    AI for food insecurity and affordable healthy diets.
    Hey everyone! I’m giving a presentation to my work about the potential of AI and what how it could be helpful to certain communities. Most of the tools I’ve noticed are super helpful when it comes to office work or productivity, but I wanted to ask if y’all think it’s possible for an AI model to be fine-tuned towards food recipes and recommend meals and snacks based off of the users preference. Like it asks what my monthly budget is, caloric requirement, provides prices from available grocery sites, allergies, and what the user owns or doesn’t own already. I just know from experience that I didn’t really know of calorie dense foods in undergrad and could have been eating healthier while also getting the necessary energy and saving money. Do you think AI is at that point yet? submitted by /u/Content_Test_8910 [link] [comments]  ( 8 min )
    Anyone know what AI tool he used to make Spider Man Miles Morales look like this?
    submitted by /u/Evannnnnm [link] [comments]  ( 8 min )
    ‘Like science fiction,’ Seattle startup sends laser-equipped robots to zap weeds on farmland
    submitted by /u/JohnnyMnemo [link] [comments]  ( 8 min )
    Descript alternative for voice cloning a video game character?
    Just been learning Descript and spent hours cutting up video game footage to get 10 minutes of a character talking to test out voice cloning to make the character say different things. When I submit training data, there's a disclaimer about only using your own voice. I didn't realise this but I understand it makes sense. My question is, are there any other alternatives? For example, how are these AI songs getting put out there? submitted by /u/SupremeFlamer [link] [comments]  ( 8 min )
    AI tool for video resolution upscale?
    Does anyone know about any video resolution upscale AI, there is the "nightmareai" for upscaling photos and im wondering if something like that exists for videos aswell. submitted by /u/ZipoxD [link] [comments]  ( 8 min )
    I've Been Working On Something Exciting and I Want To Ask For Your Feedback
    So Imagine This Scenerio: You Upload a Picture of Your FACE, and Create an AI DIGITAL HUMAN of Yourself I was thinking about new ideas to add on my Chat with PDF tool. An idea struck me: whouldn't it be awesome if you can create an AI digital human of yourself and ask questions from your AI twin instead of faceless ChatGPT?💡Imagine being able to ask questions to your own AI twin that gives you answer from your uploaded PDFs. This idea looked interesting to me so I have started working on it. https://preview.redd.it/wxby7wqa8l6b1.jpg?width=955&format=pjpg&auto=webp&s=a5f6f2e73d29d130ad1609777e3c44c4ea779521 I'm eager to hear your thoughts on this new feature! Will it succeed or fail? submitted by /u/Brian-Hose225 [link] [comments]  ( 8 min )
    Is OpenAI's finetunable models what I need to create a tech support tool?
    Hello! I would like to train OpenAI's Davinci model on a company's knowledge base, which consists of a few hundred articles and transcribed videos. I want to use this model as a chat support helper tool that understands the user's questions and feeds me intelligent and factually correct replies. I provide tech support and would like to make my work easier. Their documentation on fine-tuning makes it seems like it's more about learning entirely new skills than just adding more information. It recommends using multiple examples and the case study of a customer support chatbot uses a different structure from what I need: {"prompt":"Summary: \n\nSpecific information:\n\n###\n\nCustomer: \nAgent: \nCustomer: \nAgent:", "completion":" \n"} {"prompt":"Summary: \n\nSpecific information:\n\n###\n\nCustomer: \nAgent: \nCustomer: \nAgent: \nCustomer: \nAgent:", "completion":" \n"} Are their finetunable models what I'm looking for or should I be using something else? What prompt structure should I use if I want it to feed answers to questions like ChatGPT would? Thanks a lot! submitted by /u/JoZeHgS [link] [comments]  ( 8 min )
    An optimistic outlook on AI
    Now I’d like to start off by saying apart from a surface level understand of computers, technology and Artificial intelligence I am not qualified to speak on this subject at all but I feel like most AI focussed subs (especially r/singularity) have a very pessimistic and dystopian prediction on the future of this technology and I like to throw some ideas out in the other direction because I’m interested to see if anyone else has some polarising views. Now I would describe myself as a creative person, so that’s where most of my thoughts stem from, for example I can imagine a future where AI generated media makes it possible for anyone to make anything, which will of course be terrible at first. I can definitely imagine someone shitposting an entire movie or something, but once the intial no…  ( 9 min )
    Generative fill
    Are there any other free generative fill AI's similar to Photoshops and Dall-E's Outpanting? submitted by /u/Boatetye [link] [comments]  ( 8 min )
    Best free voice cloning?
    I really want to clone some voices and the only good one is ElevenLabs but that is paid. I tried out Tortoise but that was pretty bad. I am looking for a ElevenLabs competitor that is free/freemium. submitted by /u/SimRacer101 [link] [comments]  ( 8 min )
  • Open

    47.9% Robust Test Error on CIFAR10 with Adversarial Training and PyTorch
    Knowing how to compute adversarial examples from this previous article, it would be ideal to train models for which such adversarial examples do not exist. This is the goal of developing adversarially robust training procedures. In this article, I want to describe a particularly popular approach called adversarial training. The idea is to train on adversarial examples computed during training on-the-fly. I will also discuss a PyTorch implementation that obtains 47.9% robust test error — 52.1% robust accuracy — on CIFAR10 using a WRN-28-10 architecture. The post 47.9% Robust Test Error on CIFAR10 with Adversarial Training and PyTorch appeared first on David Stutz.  ( 10 min )
  • Open

    Set of orbits with the same average distance to sun
    Suppose a planet is in an elliptical orbit around the sun with semimajor axis a and  semiminor axis b. Then the average distance of the planet to the sun over time equals a(1 + e²/2) where the eccentricity e satisfies e² = 1 − b²/a². You can find a proof of this statement in [1]. […] Set of orbits with the same average distance to sun first appeared on John D. Cook.  ( 5 min )
  • Open

    The emoji of the future
    Some of the recent image-generating models have this thing where they can fill in the blank parts of images. It's handy when you want to show them exactly how to give you more of the same. Like these animal emoji. See if you can tell which ones I  ( 3 min )
    Bonus: More extrapolated emoji
    AI Weirdness: the strange side of machine learning  ( 2 min )
  • Open

    Could someone make colab of this awesome repo for Blind super resolution?
    GRL-Image-Restoration ​ tried it on video files through the holywu vapoursynth port and results are awesome. Could someone make colab of this repo for Blind super resolution?or write clear instructions how to setup and run inference in anaconda environment? Thank you very, very much in advance. ​ submitted by /u/zelenooki87 [link] [comments]  ( 8 min )
    How Activation Function works in Non-Linear Data ?
    I am having trouble understanding how Activation function works helps in fitting non linear function? Here is the visualization of two neurons with ReLu https://pasteboard.co/LRbI8pjVlSRC.jpg https://www.desmos.com/calculator/y4jsrd2hms submitted by /u/god00speed [link] [comments]  ( 8 min )
  • Open

    Question about Transformer model input in RL
    Hello there! I'm currently trying to implement the GTrXL model with PPO, and I have a question regarding the correct input format for the model. After going through some of the source code, I noticed that the forward input is expected to have the shape `(batch size, 1, state size)` , where 1 represents the sequence length. I'm curious to know if this input format is the correct one for the model to function properly. Cause I thought that in order for transformer model to work, the sequence length must at least have to be an or few episodes length. Thanks! ​ submitted by /u/dinhdat1110 [link] [comments]  ( 8 min )

  • Open

    One-Minute Daily AI News 6/16/2023
    Chinese tech giants ByteDance (TikTok), Tencent, Baidu, and Alibaba simply can’t get enough of Nvidia’s High-Performance Computing (HPC) products. This news comes from Chinese media, which reports that TikTok creator, ByteDance, alone, has already caught up (in pure dollar terms) to what the entire Chinese market ordered from Nvidia in 2022.[1] Chinese President Xi Jinping discussed the global rise of artificial intelligence with Bill Gates on Friday and said he welcomed U.S. firms including Microsoft bringing their AI tech to China.[2] Today, Meta has announced its latest generative AI model, following on the back of ImageBind is Voicebox, which is designed to help creators with its ability to perform speech generation tasks such as audio editing, sampling, and stylizing, even if it wasn’t specifically trained to do so through in-context learning.[3] AI-generated ‘Family Guy’ livestream banned after making a bomb threat.[4] Sources: [1] https://www.tomshardware.com/news/chinas-bytedance-has-gobbled-up-dollar1-billion-of-nvidia-gpus-for-ai-this-year [2] https://www.reuters.com/technology/chinas-xi-tells-bill-gates-he-welcomes-us-ai-tech-china-2023-06-16/ [3] https://www.neowin.net/news/meta-announces-voicebox-its-generative-ai-model-for-audio/ [4] https://www.nme.com/news/tv/ai-generated-family-guy-livestream-banned-after-making-a-bomb-threat-3457051 submitted by /u/Excellent-Target-847 [link] [comments]  ( 8 min )
    I think everyone should watch this video
    While i think AI advancements are crazy i think this video puts a lot in perspective. Just casually browsing I think we are echo chambering how AI is about to take over etc. While we definitely should look at the future and maybe talk about putting protections in place. While we are closer than ever we are still no where near real general AI without another massive leap. https://www.youtube.com/watch?v=l7tWoPk25yU The video talks about the issues they found with the AI Go bot that was "unbeatable" but actually didnt even understand the most basic levels of the game even though it was destroying grand masters. submitted by /u/phatrequiem [link] [comments]  ( 8 min )
    What’s the best ai to generate accurate hands and feet???
    I have been looking around and can’t find anything that looks “human” submitted by /u/ExaminationNo6235 [link] [comments]  ( 8 min )
    Day 7: I did some research and experimented with different prompts on Bing Image Creator
    ​ https://preview.redd.it/esvzoywjrf6b1.jpg?width=1024&format=pjpg&auto=webp&s=d8377cc8d55a929661e1bfa4f1364ffb699898f1 https://preview.redd.it/cd621lxjrf6b1.jpg?width=1024&format=pjpg&auto=webp&s=5a5a14b7155401713381a47c6e9e118fe1dbfbcf https://preview.redd.it/jl9yozwjrf6b1.jpg?width=1024&format=pjpg&auto=webp&s=e10b025891c22af7ad1d111aedb0d16f4769cfd5 https://preview.redd.it/llpz5mxjrf6b1.jpg?width=1024&format=pjpg&auto=webp&s=f0b0d94317df71f341f7c67de9a3c4ddd4d264bb https://preview.redd.it/2065vlxjrf6b1.jpg?width=1024&format=pjpg&auto=webp&s=71a00e14946118ccbef4e7d4211985c3f8ebdcf6 https://preview.redd.it/zrj75txjrf6b1.jpg?width=1024&format=pjpg&auto=webp&s=7685fbd05bca9ec8946ab0e2b7c83125bf1dfe0b submitted by /u/Blaze_furyX [link] [comments]  ( 8 min )
    Day 7: I did some research and experimented with different prompts on Bing Image Creator
    ​ https://preview.redd.it/qy3lvvxarf6b1.jpg?width=1024&format=pjpg&auto=webp&s=bbdd1c1748ae2bb9da79248a680006cd544688d4 https://preview.redd.it/747lcryarf6b1.jpg?width=1024&format=pjpg&auto=webp&s=3a82e42dd63b4913ae21d61c7922ea131bd619f8 https://preview.redd.it/d9wulxxarf6b1.jpg?width=1024&format=pjpg&auto=webp&s=0c4a9f1e7e73353c52ac95807a8563971b32fb89 https://preview.redd.it/o6m9sjyarf6b1.jpg?width=1024&format=pjpg&auto=webp&s=e672894dc8525f9b0d0107f5015ee0b3a04e74e7 https://preview.redd.it/k37gqjyarf6b1.jpg?width=1024&format=pjpg&auto=webp&s=fcd42a2231a41018b217eb55546efc1ba7d11f27 https://preview.redd.it/q6gvt5zarf6b1.jpg?width=1024&format=pjpg&auto=webp&s=1b5d1c1067ea10986e1f12c07f169f147dcb449e https://preview.redd.it/xfc2kjyarf6b1.jpg?width=1024&format=pjpg&auto=webp&s=af09ef9c7302230f63a370292c05e5074c648816 https://preview.redd.it/onflyjyarf6b1.jpg?width=1024&format=pjpg&auto=webp&s=19f1c34218528d145bba5dc2bc0e7219074b0955 submitted by /u/Blaze_furyX [link] [comments]  ( 8 min )
    Will AI replace Accounting Jobs?
    Sorry if you get plenty of these questions. But I am looking into going off to University/College and one of my options is Accounting, however. I'm concerned that with the rise of AI the past few months. That this job field might soon be gone. Am I just needlessly worrying? submitted by /u/Snow_Mexican1 [link] [comments]  ( 8 min )
    How can one test this hypothesis using data and AI that "people like to eat what they had in parties as kids"
    Basically the hypothesis is that we tend to like food that we had as a group when we were kids. For example, the birthday party cake, candies and ice-cream. Juice and cup cakes are all party foods. If the party foods of kids change for healthier like naturally colored salads and vegetables with multi-grain bread, then the actual taste of people will also change. submitted by /u/holihai [link] [comments]  ( 8 min )
    I'm looking for an AI tool that would create a basic financial forecast. Is there anything out there?
    submitted by /u/Dalembert [link] [comments]  ( 8 min )
    AI characters, without any content filtering (TruePerson AI)
    TruePerson AI, chat with Uncensored AI Characters Because of the many content restrictions of AI in general, we made an AI chat app that is able to bypass content limitations. The AI in itself is based upon GPT. However, it's been modified to be capable of generating any content and be as neutral as possible. On the app, you can choose among hundreds of AI characters and even create your own. For now, the results are just fine, but we are eager to hear more about your experience. Feel free to exploit it and push its limits ! 🤖 Download on Play Store 🍏 Download on App Store 💜 Join our community Wanna push the experience further ? App is free to use, but premium offers are available that let you chat without having to worry about credits. submitted by /u/Fayerdd [link] [comments]  ( 8 min )
    AI — weekly megathread!
    This week in AI - partnered with aibrews.com feel free to follow their newsletter News & Insights ElevenLabs has launched AI Speech Classifier - an authentication tool that lets you upload any audio sample to identify if it contains ElevenLabs AI-generated audio [Details]. Nvidia Research presents SceneScape - a method to generate long-term walkthroughs in imaginary scenes just from an input text prompt [Details |Paper ]. Meta AI introduces the Image Joint Embedding Predictive Architecture (I-JEPA), a new AI model which learns from the world like humans and excels in computer vision tasks, while being more computationally efficient. It learns by creating an internal model of the outside world, which compares abstract representations of images (rather than comparing the pixels themsel…  ( 10 min )
    The AI singularity is nearer than we think?
    Hey folks, I’ve been mulling over something. Every other day I see some article about more progress in AI, and it just doesn’t seem to stop. Just look at the AI models today, they're getting massive, and their capabilities are already borderline scary. How much longer before the lines between AI and human blur, or even become indistinguishable? Take gpt-4 or Anthropic’s Claude, for instance, they're so much better at understanding and even generating human-like text. Not to mention, these models are conditioned to avoid certain topics and tasks. I don’t think people realize that AutoGPT doesn’t work well because OpenAI and Anthropic have trained their AI’s not to be autonomous. Could we possibly already be in the AI singularity? submitted by /u/Emotional_Ratio_3251 [link] [comments]  ( 8 min )
    IBM Research: The 100,000 Qubit Quantum-Centric Supercomputer of 2033
    submitted by /u/linebell [link] [comments]  ( 8 min )
    [Question/Discussion] Ideas on Whisper AI hallucinating
    I used Whisper AI (openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision (github.com)) to transcribe the Common Voice data set (Common Voice (mozilla.org)) for a language and note that the 'tiny' model hallucinates a lot, whereas the bigger 'small' model almost does not hallucinate at all, and the even bigger 'base' model hallucinates more than the 'small' model. Furthermore, the general performance of the small model is better than both the tiny and base models. As a side note, the data instances in this data set are sentences worth about 5-10 seconds of audio. I am mostly interested in your thoughts on why a larger model does not necessarily perform better and may hallucinate more. I did not change any of the temperature or other settings when transcribing. I can imagine a larger model might overfit which can cause this phenomenon but I would like to know what you guys think might be the cause of the lower performance with more hallucinations. As context: I am doing research for my master thesis so any ideas are welcome! submitted by /u/Rikker_33 [link] [comments]  ( 8 min )
    Using chatGPT as a teacher
    Hello everyone, I'm looking into the impact of chatGPT as a teaching aid for educators. Please let me know how you're leveraging ChatGPT or other AI tools to reduce workload or save yourself from rote work. Also, what tasks do you use it frequently for and are there any challenges? If willing to help leave your answers here https://forms.gle/3SRRToFtVDCaGMyv9 submitted by /u/Winter-Mud9223 [link] [comments]  ( 8 min )
    Custom-AI App
    I need your expertise! I’m working on creating sets of Earth Science practice questions with multiple-choice answers for high school students. Some of these sets require images/data tables/graphs that students will reference to answer the question. Essential Requirements: Questions should test the application of knowledge, not just recall. Question difficulty and format must be consistent. Answer options should be plausible, distinct, and similar in length and detail. Here's an example: INSTEAD OF "How do scientists calculate the age of a star?" (this is a content recall question) → "Scientists calculate the age of a star by using Hertzsprung-Russell diagrams. These diagrams have the temperature of stars plotted against their brightness. Using the following graph (insert image), approximate the age of a star that is x degrees in temperature and x in brightness." I’ve tried using Chat GPT-4, however I’m finding that it doesn’t remember the criteria for quality questions and frequently will give questions that rely upon recall. Additionally, Chat-GPT cannot generate images or graphs that I might use as stimuli to accompany the questions. I have basic programming knowledge and I’m wondering if it's possible to create an automated tool to generate these questions and answers, and incorporate relevant images. Which platforms or programming languages would be best to use for this task? submitted by /u/guppyguyco [link] [comments]  ( 8 min )
    What’s the best free AI text to speech that has unlimited characters
    I run a YouTube Channel where i use text to speech and I can’t seem to find a good free platform with unlimited characters. submitted by /u/Disastrous_Loan753 [link] [comments]  ( 8 min )
  • Open

    Voicebox From Meta AI Gonna Change Voice Generation & Editing Forever - Can Eliminate ElevenLabs
    submitted by /u/CeFurkan [link] [comments]  ( 8 min )
    Training slow down for batch size > 1. A single operation is the issue. Help.
    Note: I am using an up to date version of Pytorch and running on GPU. When I train my model with batch size 1, it goes as expected. When I use batch size 2 (still fits in GPU) there is one specific ConvolutionBackward0 operation that suddenly takes WAAAAAY longer (I mean 10 times...). No other backward operations are affected. Does anyone and suggestions as to way this could be happening? It seems extremely counter intuitive that only one piece of the backward pass breaks. submitted by /u/teduck1 [link] [comments]  ( 8 min )
    [HELP] Train
    hey guys I want to fine tune a Davinci 003 model to write in a specific writing style. Problem is I don't understand how to create the dataset for this. lets say I want my model to write articles in a specific structure What I need to put as Prompt and Compilation in the dataset? ​ let's say I want it to write articles in a very specific structure. for an example: article introduction section short explanation of the main topic: Every jeep has their own specific codes which are used to notify you that something is wrong. For example, DTC codes like c123f or u1110 etc. main keyword as a question: So, what does C123F code Jeep mean? short direct answer to the question: Jeep code C123F may come up due to a damaged steering column or intermediate shaft. It can be also be due to incorrect positioning of steering angle sensor or improper clock spring installation. Resetting the ECU, properly installing the clockspring or replacing the clock spring can fix the issue. That was just the preview. If you want more details in this matter, keep reading this article. how to train the model to write intros like above? submitted by /u/buxrmp [link] [comments]  ( 8 min )
  • Open

    Improving Subseasonal Forecasting with Machine Learning
    This content was previously published by Nature Portfolio and Springer Nature Communities on Nature Portfolio Earth and Environment Community. Improving our ability to forecast the weather and climate is of interest to all sectors of the economy and to government agencies from the local to the national level. Weather forecasts zero to ten days ahead and […] The post Improving Subseasonal Forecasting with Machine Learning appeared first on Microsoft Research.  ( 11 min )
  • Open

    Envisioning the future of computing
    MIT students share ideas, aspirations, and vision for how advances in computing stand to transform society in a competition hosted by the Social and Ethical Responsibilities of Computing.  ( 9 min )
  • Open

    Enhancing patient care with AI-powered health monitoring and remote patient management
    Healthcare has undergone consistent changes. From surgeries that were conducted by sedating patients with opium and amputation of an infected part, the technology has evolved to anesthesia and new & innovative ways of treating bacterial infection. We can now conduct LASIK surgeries and have even healed conjoined twins from the head. With so much advancement,… Read More »Enhancing patient care with AI-powered health monitoring and remote patient management The post Enhancing patient care with AI-powered health monitoring and remote patient management appeared first on Data Science Central.  ( 22 min )
  • Open

    Quantifying Signal-to-Noise Ratio in High Variance, Low Reward Improvement Environments
    I am dealing with a class of environments / reward function that only allow a very slight improvement of the mean rewards but have relatively high variance. So basically the distribution of rewards is very wide but shifts only slightly over the course of the training. I know this because I have a pretty good MPC policy that I can run on the environment and see the improvement over the initial policy versus the variance of the rewards. The smaller the ratio of possible reward improvement over the variance of rewards, the harder it becomes for the RL algorithm to learn a good policy. Which is plausible. Here is an example calculation for an environment. I had a hard time getting it to converge and the problem is super sensitive to the hyperparameter settings. ​ Rewards for the initial policy and the MPC policy, the standard deviation of rewards and the ratio of possible improvement over std(rewards). My question would be, is there an agreed way to quantify this signal-to-noise ratio or ratio of possible improvement? And is there literature investigating this problem or do you have any experience what would be a 'good' ratio? submitted by /u/flxh13 [link] [comments]  ( 8 min )
  • Open

    SambaSafety automates custom R workload, improving driver safety with Amazon SageMaker and AWS Step Functions
    At SambaSafety, their mission is to promote safer communities by reducing risk through data insights. Since 1998, SambaSafety has been the leading North American provider of cloud–based mobility risk management software for organizations with commercial and non–commercial drivers. SambaSafety serves more than 15,000 global employers and insurance carriers with driver risk and compliance monitoring, online […]  ( 6 min )

  • Open

    Implementing Gradient Descent in PyTorch
    The gradient descent algorithm is one of the most popular techniques for training deep neural networks. It has many applications in fields such as computer vision, speech recognition, and natural language processing. While the idea of gradient descent has been around for decades, it’s only recently that it’s been applied to applications related to deep […] The post Implementing Gradient Descent in PyTorch appeared first on MachineLearningMastery.com.  ( 25 min )

  • Open

    Training a Linear Regression Model in PyTorch
    Linear regression is a simple yet powerful technique for predicting the values of variables based on other variables. It is often used for modeling relationships between two or more continuous variables, such as the relationship between income and age, or the relationship between weight and height. Likewise, linear regression can be used to predict continuous […] The post Training a Linear Regression Model in PyTorch appeared first on MachineLearningMastery.com.  ( 24 min )
    Making Linear Predictions in PyTorch
    Linear regression is a statistical technique for estimating the relationship between two variables. A simple example of linear regression is to predict the height of someone based on the square root of the person’s weight (that’s what BMI is based on). To do this, we need to find the slope and intercept of the line. […] The post Making Linear Predictions in PyTorch appeared first on MachineLearningMastery.com.  ( 21 min )

  • Open

    Loading and Providing Datasets in PyTorch
    Structuring the data pipeline in a way that it can be effortlessly linked to your deep learning model is an important aspect of any deep learning-based system. PyTorch packs everything to do just that. While in the previous tutorial, we used simple datasets, we’ll need to work with larger datasets in real world scenarios in […] The post Loading and Providing Datasets in PyTorch appeared first on MachineLearningMastery.com.  ( 20 min )

  • Open

    Using Dataset Classes in PyTorch
    In machine learning and deep learning problems, a lot of effort goes into preparing the data. Data is usually messy and needs to be preprocessed before it can be used for training a model. If the data is not prepared correctly, the model won’t be able to generalize well. Some of the common steps required […] The post Using Dataset Classes in PyTorch appeared first on MachineLearningMastery.com.  ( 21 min )

  • Open

    Calculating Derivatives in PyTorch
    Derivatives are one of the most fundamental concepts in calculus. They describe how changes in the variable inputs affect the function outputs. The objective of this article is to provide a high-level introduction to calculating derivatives in PyTorch for those who are new to the framework. PyTorch offers a convenient way to calculate derivatives for […] The post Calculating Derivatives in PyTorch appeared first on Machine Learning Mastery.  ( 20 min )

  • Open

    Two-Dimensional Tensors in Pytorch
    Two-dimensional tensors are analogous to two-dimensional metrics. Like a two-dimensional metric, a two-dimensional tensor also has $n$ number of rows and columns. Let’s take a gray-scale image as an example, which is a two-dimensional matrix of numeric values, commonly known as pixels. Ranging from ‘0’ to ‘255’, each number represents a pixel intensity value. Here, […] The post Two-Dimensional Tensors in Pytorch appeared first on Machine Learning Mastery.  ( 21 min )

  • Open

    One-Dimensional Tensors in Pytorch
    PyTorch is an open-source deep learning framework based on Python language. It allows you to build, train, and deploy deep learning models, offering a lot of versatility and efficiency. PyTorch is primarily focused on tensor operations while a tensor can be a number, matrix, or a multi-dimensional array. In this tutorial, we will perform some […] The post One-Dimensional Tensors in Pytorch appeared first on Machine Learning Mastery.  ( 22 min )

  • Open

    365 Data Science courses free until November 21
    Sponsored Post   The unlimited access initiative presents a risk-free way to break into data science.     The online educational platform 365 Data Science launches the #21DaysFREE campaign and provides 100% free unlimited access to all content for three weeks. From November 1 to 21, you can take courses from renowned instructors and earn […] The post 365 Data Science courses free until November 21 appeared first on Machine Learning Mastery.  ( 15 min )

  • Open

    Attend the Data Science Symposium 2022, November 8 in Cincinnati
    Sponsored Post      Attend the Data Science Symposium 2022 on November 8 The Center for Business Analytics at the University of Cincinnati will present its annual Data Science Symposium 2022 on November 8. This all day in-person event will have three featured speakers and two tech talk tracks with four concurrent presentations in each track. The […] The post Attend the Data Science Symposium 2022, November 8 in Cincinnati appeared first on Machine Learning Mastery.  ( 10 min )

  • Open

    My family's unlikely homeschooling journey
    My husband Jeremy and I never intended to homeschool, and yet we have now, unexpectedly, committed to homeschooling long-term. Prior to the pandemic, we both worked full-time in careers that we loved and found meaningful, and we sent our daughter to a full-day Montessori school. Although I struggled with significant health issues, I felt unbelievably lucky and fulfilled in both my family life and my professional life. The pandemic upended my careful balance. Every family is different, with different needs, circumstances, and constraints, and what works for one may not work for others. My intention here is primarily to share the journey of my own (very privileged) family. Our unplanned introduction to homeschooling For the first year of the pandemic, most schools in California, where …  ( 7 min )

  • Open

    The Jupyter+git problem is now solved
    Jupyter notebooks don’t work with git by default. With nbdev2, the Jupyter+git problem has been totally solved. It provides a set of hooks which provide clean git diffs, solve most git conflicts automatically, and ensure that any remaining conflicts can be resolved entirely within the standard Jupyter notebook environment. To get started, follow the directions on Git-friendly Jupyter. Contents The Jupyter+git problem The solution The nbdev2 git merge driver The nbdev2 Jupyter save hook Background The result Postscript: other Jupyter+git tools ReviewNB An alternative solution: Jupytext nbdime The Jupyter+git problem Jupyter notebooks are a powerful tool for scientists, engineers, technical writers, students, teachers, and more. They provide an ideal notebook environment for interact…  ( 7 min )
2023-07-16T01:10:41.841Z osmosfeed 1.15.1